Technical Blueprint v7 Blueprint Técnico v7

Shopilot.ai Shopilot.ai

The AI copilot for eCommerce — built like Cursor, powered like Claude Code, designed for sellers. El copilot de IA para eCommerce — construido como Cursor, potenciado como Claude Code, disenado para vendedores.

A native browser app that lives where sellers work. Shopilot sees your data, understands your business, executes actions, and proactively tells you what to do next. Una app nativa tipo navegador que vive donde trabajan los vendedores. Shopilot ve tus datos, entiende tu negocio, ejecuta acciones y proactivamente te dice que hacer.

v7 — Final blueprint: Linear workspace live, 192 issues, 6 projects, 6 cycles, 5 gates | Mar 10, 2026 v7 — Blueprint final: Linear workspace activo, 192 issues, 6 proyectos, 6 ciclos, 5 gates | Mar 10, 2026

10

Weeks to MVPSemanas al MVP

~65%

Backend reused from existing codeBackend reutilizado de codigo existente

192

Issues in LinearIssues en Linear

19

Architecture ProjectsProyectos Arquitectura

!

Changelog — v1 → v7 Changelog — v1 → v7

7 major iterations, 4 CTO/PM/PO audits, 27/27 checks PASS | Final blueprint with Linear workspace 7 iteraciones mayores, 4 auditorias CTO/PM/PO, 27/27 checks PASS | Blueprint final con Linear workspace

v7

Final Blueprint — Linear WorkspaceBlueprint Final — Linear Workspace

Mar 10, 2026
+Linear workspace fully configured — Workspace beautonomous, Team Shopilot (AUT). All project management migrated from markdown specs to Linear as single source of truth for execution tracking.Linear workspace completamente configurado — Workspace beautonomous, Team Shopilot (AUT). Todo el project management migrado de specs en markdown a Linear como fuente única de verdad para tracking de ejecución.
+192 issues created (AUT-22 to AUT-213) — Every T-code from sprints.md converted to a Linear issue with: title (EN), description + acceptance criteria, assignee, priority (Urgent/High/Medium/Low), Fibonacci estimate (1/2/3/5/8 pts), labels, project, cycle, and blocked/blocking relations.192 issues creados (AUT-22 a AUT-213) — Cada T-code de sprints.md convertido a un issue de Linear con: título (EN), descripción + criterios de aceptación, asignado, prioridad (Urgent/High/Medium/Low), estimación Fibonacci (1/2/3/5/8 pts), labels, proyecto, ciclo, y relaciones blocked/blocking.
+6 time-bound Projects — P1 Walking Skeleton (Mar 11-28), P2 Core Engines (Mar 31-Apr 11), P3 WRITE + Billing + Design (Apr 14-25), P4 Integration + Polish (Apr 28-May 9), P5 Production + Launch (May 12-23), P6 Buffer (May 26-Jun 6).6 Proyectos por tiempo — P1 Walking Skeleton (Mar 11-28), P2 Core Engines (Mar 31-Abr 11), P3 WRITE + Billing + Design (Abr 14-25), P4 Integration + Polish (Abr 28-May 9), P5 Production + Launch (May 12-23), P6 Buffer (May 26-Jun 6).
+6 Cycles (2 weeks each) — C1 through C5 + Cooldown, each aligned 1:1 with a Project. 2-week cadence, no auto-create, automations off.6 Ciclos (2 semanas cada uno) — C1 a C5 + Cooldown, cada uno alineado 1:1 con un Proyecto. Cadencia de 2 semanas, sin auto-crear, automatizaciones apagadas.
+5 Milestones (Gates) — Gate 0: APIs Connected (Mar 28), Gate 1: “It Reads” (Apr 11), WRITE + Billing Functional (Apr 25), Gate 2: “It Acts” (May 9), Go/No-Go (May 23).5 Milestones (Gates) — Gate 0: APIs Conectadas (Mar 28), Gate 1: “Lee” (Abr 11), WRITE + Billing Funcional (Abr 25), Gate 2: “Actúa” (May 9), Go/No-Go (May 23).
+1 Initiative — “MVP Shopilot — Cursor for eCommerce” grouping all 6 projects. Target: Jun 6, 2026.1 Iniciativa — “MVP Shopilot — Cursor for eCommerce” agrupando los 6 proyectos. Meta: Jun 6, 2026.
+Label system — Layer/ (7 architectural layers), Scope/ (MVP-Critical, Important, Circuit-Breaker), Type/ (Feature, Build, Gate, Eval, Mockup, Design-Delivery, Setup), Special (figma-dependency, cross-team, needs-ac).Sistema de labels — Layer/ (7 capas de arquitectura), Scope/ (MVP-Critical, Important, Circuit-Breaker), Type/ (Feature, Build, Gate, Eval, Mockup, Design-Delivery, Setup), Especiales (figma-dependency, cross-team, needs-ac).
+GitHub integration — Org-level webhook on pabesfu. Workflow automations: PR opened → In Progress, review requested → In Review, PR merged → Done. Branch naming: AUT-XX-description.Integración GitHub — Webhook a nivel organización en pabesfu. Automatizaciones de workflow: PR abierto → In Progress, review solicitado → In Review, PR mergeado → Done. Naming de branches: AUT-XX-description.
+Slack integration — Connected to #engineering channel for real-time notifications on issue updates, PR activity, and gate completions.Integración Slack — Conectado al canal #engineering para notificaciones en tiempo real de actualizaciones de issues, actividad de PRs, y completación de gates.
+Fibonacci estimation — All 192 issues estimated. Scale: 1pt=4h, 2pt=1d, 3pt=2d, 5pt=3d, 8pt=5d. Capacity ~80pts/cycle (4 engineers × 80h).Estimación Fibonacci — Los 192 issues estimados. Escala: 1pt=4h, 2pt=1d, 3pt=2d, 5pt=3d, 8pt=5d. Capacidad ~80pts/ciclo (4 ingenieros × 80h).
+Section 9.17 — Linear Workspace — New section documenting the complete Linear methodology: projects, cycles, milestones, labels, workflow, integrations, estimates, and how engineers interact with the workspace.Sección 9.17 — Linear Workspace — Nueva sección documentando la metodología completa de Linear: proyectos, ciclos, milestones, labels, workflow, integraciones, estimaciones, y cómo los ingenieros interactúan con el workspace.
FIXUX/UI Pipeline formalized — T0.BB through T4.BB design deliveries documented as Linear issues with Sergio handoff dependencies. External design team (#18) integrated into sprint cadence.Pipeline UX/UI formalizado — Entregas de diseño T0.BB a T4.BB documentadas como issues de Linear con dependencias de handoff a Sergio. Equipo externo de diseño (#18) integrado en la cadencia de sprints.
FIXTask count updated: 147 → 192 — Added 45 new tasks: 10 eval extension tasks (T3.40-T3.44, T4.25-T4.29), UX/UI pipeline tasks (T0.BB-T4.BB), Mockup tasks (T1.MK1-T5.MK1), code signing builds (T2.40, T4.24), CI pipeline (T1.33).Conteo de tareas actualizado: 147 → 192 — Agregadas 45 tareas nuevas: 10 tareas de extensión eval (T3.40-T3.44, T4.25-T4.29), tareas de pipeline UX/UI (T0.BB-T4.BB), tareas de Mockup (T1.MK1-T5.MK1), builds de code signing (T2.40, T4.24), CI pipeline (T1.33).
v6

Fresh Blueprint BuildReconstruccion Fresca del Blueprint

Mar 4, 2026
+v6 initial structure — rebuilding the technical blueprint from v5 foundation, section by section. Starting with Sections 1 (What is Shopilot) and 2 (How the Big Players Do It).Estructura inicial v6 — reconstruyendo el blueprint tecnico desde la base de v5, seccion por seccion. Comenzando con Secciones 1 (Que es Shopilot) y 2 (Como lo Hacen los Grandes).
+Section 3 — What We Already Have — carried from v5 as-is.Sección 3 — Lo Que Ya Tenemos — trasladada de v5 tal cual.
FIXSection 4 — What We Reuse & Why — completely rewritten. Reclassified all 17 active projects: REUSE 2 (#8, #9), ADAPT 4 (#10, #2, #5, #14), NEW 11 (#17, #12, #13, #3, #15, #4, #6, #1, #16, #7, #11). Each project now states its repo name, what code exists, and exactly what is new. Eliminated projects table (#10, #14) with absorption rationale. Summary counts grid.Sección 4 — Qué Reutilizamos y Por Qué — reescrita completamente. Reclasificados los 17 proyectos activos: REUTILIZAR 2 (#8, #9), ADAPTAR 4 (#10, #2, #5, #14), NUEVO 11 (#17, #12, #13, #3, #15, #4, #6, #1, #16, #7, #11). Cada proyecto ahora indica su nombre de repo, qué código existe, y qué es nuevo exactamente. Tabla de proyectos eliminados (#10, #14) con justificación de absorción. Grid de conteos resumen.
FIXSection 5 — Architecture — completely rewritten. From 8 layers to 7 layers: Product, Intelligence, Knowledge, Action, Platform, Quality, Internal. Each layer now has a description of its role. Every project includes its repo name (core-*). Intelligence layer details the single-repo deployment (Lambda Node.js 18 TS + API Gateway v2). Eliminated projects shown inline in their respective layers.Sección 5 — Arquitectura — reescrita completamente. De 8 capas a 7 capas: Producto, Inteligencia, Conocimiento, Acción, Plataforma, Calidad, Interno. Cada capa ahora tiene una descripción de su rol. Cada proyecto incluye su nombre de repo (core-*). Capa de Inteligencia detalla el despliegue single-repo (Lambda Node.js 18 TS + API Gateway v2). Proyectos eliminados mostrados inline en sus capas respectivas.
+Section 6 — Project Implementation Map — carried from v5 with corrected repository names aligned to layer-based naming convention (11 repos). Updated project counts from 19 to 17 active projects. Repository names now match architecture layers: core-knowledge-*, core-action-*, core-quality-*, core-internal-*.Sección 6 — Mapa de Implementación de Proyectos — trasladada de v5 con nombres de repositorio corregidos alineados a la convención de nombres por capa (11 repos). Conteo de proyectos actualizado de 19 a 17 activos. Los nombres de repositorio ahora coinciden con las capas de arquitectura: core-knowledge-*, core-action-*, core-quality-*, core-internal-*.
FIXSection 7 — Beautonomous — completely rewritten from two source documents (54-Internals + 55-Layer). 13 subsections: what it solves, architecture diagram (OpenClaw UI + Slack + Terminal), 4 capabilities (status, tasks, PR approval, quality agent), governance (3 roles, risk taxonomy, authorization flow, 5 principles), quality gate (5-step pipeline), approval pipeline & 3 environments, quality base structure per repo (bootstrap, contracts, skills per repo), proactivity triggers, system prompt, connectors (GitHub/Linear/Code/Slack), what it doesn’t do, code vs config, current state.Sección 7 — Beautonomous — reescrita completamente desde dos documentos fuente (54-Internals + 55-Layer). 13 subsecciones: qué resuelve, diagrama de arquitectura (OpenClaw UI + Slack + Terminal), 4 capacidades (status, tareas, aprobación de PRs, quality agent), gobernanza (3 roles, taxonomía de riesgo, flujo de autorización, 5 principios), quality gate (pipeline de 5 pasos), pipeline de aprobación y 3 ambientes, estructura base de calidad por repo (bootstrap, contratos, skills por repo), disparadores de proactividad, system prompt, conectores (GitHub/Linear/Code/Slack), qué no hace, código vs configuración, estado actual.
+Section 8 — Projects Grid + 8.1 + 8.2 — Project summary grid added with corrected layers (aligned to 7-layer architecture) and corrected owners. #17 Beautonomous added to grid. All repo names corrected to layer-based naming. 8.1 + 8.2 carried from v5 with Responsibility Matrix (13 rows), External Contracts table (5 deps), Completeness Check table added.Sección 8 — Grid de Proyectos + 8.1 + 8.2 — Grid de resumen con layers corregidos (alineados a arquitectura de 7 capas) y responsables corregidos. #17 Beautonomous agregado al grid. Todos los nombres de repo corregidos. 8.1 + 8.2 trasladadas de v5 con Matriz de Responsabilidades (13 filas), tabla de Contratos Externos (5 deps), tabla de Completitud agregadas.
~Incremental build approach — remaining sections (9–13, projects, MVP plan) will be added progressively in upcoming commits.Enfoque de construccion incremental — las secciones restantes (9–13, proyectos, plan MVP) se agregaran progresivamente en proximos commits.
v5

Stack Introduction + Project MapIntroduccion al Stack + Mapa de Proyectos

Mar 3, 2026
+8.1 Introducción a los Proyectos — non-technical overview: 8 project families, full flow in plain language, minimum viable core, current state of the stack.8.1 Introduccion a los Proyectos — vision no técnica: 8 familias de proyectos, flujo completo en palabras simples, nucleo minimo viable, estado actual del stack.
+8.2 Mapa del Stack — technical cross-section: how all 19 projects connect, ASCII flow diagram, critical dependencies, inter-project contracts.8.2 Mapa del Stack — corte transversal técnico: como se conectan los 19 proyectos, diagrama de flujo ASCII, dependencias criticas, contratos entre proyectos.
+108-task sprint complete — all 17 missing tasks added: T2.18-19, T3.19-22, T4.16-21, T5.15-19. Verified 108/108.Sprint 108 tareas completo — las 17 tareas faltantes agregadas: T2.18-19, T3.19-22, T4.16-21, T5.15-19. Verificado 108/108.
+#17 Beautonomous project card — integrated as first project in Section 8 with same structure as #12–19: components grid, config stack, deep-spec accordion (permission matrix, 5-phase plan, acceptance criteria), How It Works, risk analysis, key decisions, changelog.Tarjeta de proyecto #17 Beautonomous — integrada como primer proyecto en Sección 8 con la misma estructura que #12–19: grid de componentes, stack de configuración, deep-spec accordion (matriz de permisos, plan 5 fases, criterios de aceptación), How It Works, análisis de riesgos, decisiones clave, changelog.
~Section 7 retitled as "Beautonomous — Implementation Guide" — comprehensive reference (7.1–7.13) for architecture, governance, system prompt, and 5-phase config plan. Linked bidirectionally with #p0 card.Sección 7 renombrada como "Beautonomous — Guía de Implementación" — referencia completa (7.1–7.13) de arquitectura, gobernanza, system prompt y plan de configuración. Enlazada bidireccionalmente con la tarjeta #p0.
+Beautonomous governance integration — 15 projects: every active project now explicitly cites Core's governance framework. Tier 1 critical (#2 Orchestrator, #12 Marketplace Provider, #3 Tool Registry): Core is the execution engine, all 17 WRITE tools route through ConfirmationFlow, ToolPolicyFilter enforces the permission matrix. Tier 2 (#8, #13, #10, #15, #4, #5, #6, #1, #14, #16, #7, #11): each project names the specific Core principle it implements. #14 DevOps ownership corrected: Andres (IaC authoring) · Mateo (prod approvals).Integración de governance Beautonomous — 15 proyectos: todos los proyectos activos ahora citan explícitamente el framework de governance de Core. Tier 1 crítico (#2 Orchestrator, #12 Marketplace Provider, #3 Tool Registry): Core es el motor de ejecución, las 17 WRITE tools pasan por ConfirmationFlow, ToolPolicyFilter aplica la matriz de permisos. Tier 2 (#8, #13, #10, #15, #4, #5, #6, #1, #14, #16, #7, #11): cada proyecto nombra el principio específico de Core que implementa. Ownership de #14 DevOps corregido: Andres (autoría IaC) · Mateo (aprobaciones a producción).
v4

Deep Spec Rewrites + Project RestructureReescrituras Deep Spec + Reestructura de Proyectos

Mar 2, 2026

Deep Spec Rewrites — 13 projectsReescrituras Deep Spec — 13 proyectos

FIX#2 ReAct Orchestrator: Python/FastAPI → TypeScript/Lambda. 3-layer SystemPromptComposer. 7 port interfaces. Phase 0.3→1→4#2 ReAct Orchestrator: Python/FastAPI → TypeScript/Lambda. SystemPromptComposer 3 capas. 7 interfaces de puerto. Fase 0.3→1→4
FIX#10 Data Sync: two-pipeline architecture (Fast Data + Complete Data), deps → #3,#9,#12,#14#10 Data Sync: arquitectura de dos pipelines (Fast Data + Complete Data), deps → #3,#9,#12,#14
FIX#3 Tool Registry: 36 primitive tools, ToolPolicyFilter, HookLifecycle, SubtaskRunner, 5-phase plan#3 Tool Registry: 36 tools primitivas, ToolPolicyFilter, HookLifecycle, SubtaskRunner, plan 5 fases
FIX#9 Cerebro KB: 2,875 docs (not 36), Go 1.24 (not Python), Vertex AI 004 (not OpenAI), 11 namespaces#9 Cerebro KB: 2,875 docs (no 36), Go 1.24 (no Python), Vertex AI 004 (no OpenAI), 11 namespaces
FIX#4 Personality Engine: 3-layer ISystemPromptComposer, TypeScript, ~750-950 tokens / 1200 cap#4 Personality Engine: ISystemPromptComposer 3 capas, TypeScript, ~750-950 tokens / 1200 limite
FIX#15 Feedback Loop: 6 mixed components → 4 owned. Python/BigQuery → TS/DynamoDB/Lambda/CDK#15 Feedback Loop: 6 componentes mezclados → 4 propios. Python/BigQuery → TS/DynamoDB/Lambda/CDK
FIX#5 Context Aggregator: 5 providers → 2 automatic sources (KB + Brand Health RAG), dynamic 200K budget#5 Context Aggregator: 5 providers → 2 fuentes automaticas (KB + Brand Health RAG), presupuesto 200K dinamico
FIX#6 Proactive Suggestions: cron Rule Engine → LLM post-tool evaluation via HookLifecycle#6 Proactive Suggestions: cron Rule Engine → evaluacion LLM post-tool via HookLifecycle
FIX#13 Billing & Credit Economy: merged #13+#14. ICreditsGate, Stripe Checkout+Webhooks, Credit Packs#13 Billing & Credit Economy: merge #13+#14. ICreditsGate, Stripe Checkout+Webhooks, Credit Packs
FIX#12 Marketplace Provider: merged #12+#10. ITokenManager + DynamoDB, OAuth2 all 3 marketplaces#12 Marketplace Provider: merge #12+#10. ITokenManager + DynamoDB, OAuth2 los 3 marketplaces
FIX#1 Native Shell: +ProfileView +BillingView +OnboardingWizard. WebSocket protocol (8+4 events)#1 Native Shell: +ProfileView +BillingView +OnboardingWizard. Protocolo WebSocket (8+4 eventos)
FIX#16 Eval Suite: LLM-as-Judge → 4-pipeline platform. Consumer-driven contract testing. 3 phases#16 Eval Suite: LLM-as-Judge → plataforma 4 pipelines. Contract testing consumer-driven. 3 fases
FIX#1 absorbs former #6 Playground — Dev Tools panel (context inspector, trace viewer, tool debugger)#1 absorbe former #6 Playground — Dev Tools panel (inspector contexto, trace viewer, tool debugger)

New Projects & StructureProyectos Nuevos y Estructura

NEW#14 DevOps (IaC): Terraform (GCP) + CloudFormation (AWS), CI/CD, 3 environments#14 DevOps (IaC): Terraform (GCP) + CloudFormation (AWS), CI/CD, 3 ambientes
NEW#7 Guardrails: InputGuard (injection + off-scope) + OutputGuard (data leak + dangerous content)#7 Guardrails: InputGuard (injection + off-scope) + OutputGuard (data leak + contenido peligroso)
NEW#11 Enrichment Layer: Market Intelligence + Content Analysis. 7/8 ANALYSIS tools. Redis cache#11 Enrichment Layer: Market Intelligence + Content Analysis. 7/8 ANALYSIS tools. Redis cache
NEWProject Map: 8 families, 11 repos, connection diagram, responsibility matrix, critical depsMapa de Proyectos: 8 familias, 11 repos, diagrama de conexiones, matriz de responsabilidades

Projects Eliminated — 3Proyectos Eliminados — 3

DELformer #6 Playground → absorbed into #1 Native Shell (Dev Tools panel)antiguo #6 Playground → absorbido en #1 Native Shell (Dev Tools panel)
DEL#10 Auth Vault → absorbed into #12 Marketplace Provider (ITokenManager)#10 Auth Vault → absorbido en #12 Marketplace Provider (ITokenManager)
DEL#14 Billing & Subscription → absorbed into #13 Billing & Credit Economy#14 Billing & Subscription → absorbido en #13 Billing & Credit Economy

Cross-Refs & EnhancementsCross-Refs y Mejoras

FIX5 stale refs fixed post-#3 rewrite (SkillRegistry/Resolver/Executor → ToolRegistry in #2, #9, #4, #14, #1)5 refs obsoletas corregidas post-rewrite #3 (SkillRegistry/Resolver/Executor → ToolRegistry en #2, #9, #4, #14, #1)
FIXlisting → product rename across all projects (30+ replacements). #2 +OrchestrationSummary. #3 +FeedbackCaptureHooklisting → product renombrado en todos los proyectos (30+ reemplazos). #2 +OrchestrationSummary. #3 +FeedbackCaptureHook
UPD#2 +ToolErrorRecovery/ResponseVerifier | #9 +Contextual Retrieval (+49-67% recall) | #5 +HyDE#2 +ToolErrorRecovery/ResponseVerifier | #9 +Contextual Retrieval (+49-67% recall) | #5 +HyDE

Quality Assurance — 17 fixes across 3 audit roundsAseguramiento de Calidad — 17 fixes en 3 rondas de auditoria

QARound 1: 8 badge/count fixes — #13 ADAPT→REWRITE, #15 DEFERRED→REWRITE, #12/#1 summary aligned, header 16 Active+2 EliminatedRonda 1: 8 fixes badges/conteos — #13 ADAPT→REWRITE, #15 DEFERRED→REWRITE, #12/#1 resumen alineado, header 16 Activos+2 Eliminados
QARound 2: 6 pre-existing fixes — #10 data-status exists→adapt, #14 badge ADAPT+NEW→EXISTS+NEW, #13/#14 owner mismatches, #2 table badgeRonda 2: 6 fixes pre-existentes — #10 data-status exists→adapt, #14 badge ADAPT+NEW→EXISTS+NEW, #13/#14 owners desalineados, #2 badge tabla
QARound 3: 3 final fixes — #8/#10 summary table badges, #8 card badge ~90%→EXISTS, v1 added to changelog panelRonda 3: 3 fixes finales — #8/#10 badges tabla resumen, #8 badge card ~90%→EXISTS, v1 agregado al panel changelog
v3

Deep Specs + CTO/PM/PO AuditSpecs Profundas + Auditoria CTO/PM/PO

27/27 PASS Feb 27-28, 2026
+15 projects expanded: data models, APIs, acceptance criteria, impl plans, risk analysis15 proyectos expandidos: modelos de datos, APIs, criterios de aceptación, planes de impl, analisis de riesgo
+CORE Shopilot Core: governance, risk taxonomy, permission matrix, system prompt architectureCORE Shopilot Core: gobernanza, taxonomía de riesgo, matriz de permisos, arquitectura de system prompt
+Glossary (20 terms), CI/CD & Infrastructure section, sidebar nav + project collapseGlosario (20 terminos), seccion CI/CD e Infraestructura, sidebar nav + colapso de proyectos
+UX overhaul: 12px min typography, WCAG contrast, progress bar, project layer groupsUX overhaul: tipografia 12px min, contraste WCAG, barra de progreso, grupos por capa
FIXDependencies: #2 → #3,#4,#12 | #5 → #10,#3,#9,#6Dependencias: #2 → #3,#4,#12 | #5 → #10,#3,#9,#6
FIXToken budget unified: ~1500 target, <2000 hard cap (was contradictory 1500-3000)Token budget unificado: ~1500 target, <2000 hard cap (era contradictorio 1500-3000)
FIXBilling: no overage — soft-block + Credit Packs | Business $149 (was $399)Billing: sin overage — bloqueo suave + Credit Packs | Business $149 (era $399)
~Workload rebalanced: Shopify→Mateo, onboarding→Pablo, CI/CD→Mateo, KB→MateoCarga rebalanceada: Shopify→Mateo, onboarding→Pablo, CI/CD→Mateo, KB→Mateo
v2.1

Freemium Pricing ModelModelo de Precio Freemium

Feb 27, 2026
+Free ($0, 50 cr/mo, 5 read-only skills) + Pro ($49/mo, 500 cr, 8 skills)Free ($0, 50 cr/mes, 5 skills lectura) + Pro ($49/mes, 500 cr, 8 skills)
+Credit Packs: Basic $5/100cr, Popular $20/500cr, Power $35/1000crCredit Packs: Basic $5/100cr, Popular $20/500cr, Power $35/1000cr
+Proactive alerts gated to Pro (4 rules) | Plan-aware UI messagingAlertas proactivas solo Pro (4 reglas) | Mensajeria plan-aware en UI
v2

CTO Technical ReviewRevision Técnica del CTO

— Mateo Quintero

Feb 27, 2026
FIXCompetitor data → MeLi Search API (/sites/MLA/search). Affects 4 projectsDatos de competidores → MeLi Search API (/sites/MLA/search). Afecta 4 proyectos
SIM#14 Billing → Freemium + Pro. #13 Credit Economy → quotas + soft/hard blocks#14 Billing → Freemium + Pro. #13 Credit Economy → quotas + bloqueo soft/hard
DEF#15 Feedback Loop → Phase 2. Saves ~1 week#15 Feedback Loop → Fase 2. Ahorra ~1 semana
SIMTool Registry: 36 primitive tools via Anthropic tool_use (no Skills layer)Tool Registry: 36 tools primitivas via tool_use de Anthropic (sin capa de Skills)
+Redis (Cloud Memorystore), GCP Secret Manager, WebContentsView
+Breadcrumb nav, competitor DAG in Data Sync (#10)Navegacion breadcrumb, DAG de competidores en Data Sync (#10)
~Confirmation timeout: 30min (was 5min) | RAM: 500MB (was 300MB)Timeout confirmacion: 30min (era 5min) | RAM: 500MB (era 300MB)
v1

Initial MVP BlueprintBlueprint MVP Inicial

Feb 26, 2026
+15 initial projects defined: Marketplace Provider, Observability, Billing, Data Sync, Orchestrator, Playground, Tool Registry, Cerebro KB, Feedback Loop, Auth Vault, Personality Engine, Context Aggregator, Proactive Suggestions, Metering, Native Shell15 proyectos iniciales definidos: Marketplace Provider, Observability, Billing, Data Sync, Orchestrator, Playground, Tool Registry, Cerebro KB, Feedback Loop, Auth Vault, Personality Engine, Context Aggregator, Proactive Suggestions, Metering, Native Shell
+CORE Shopilot Core: governance framework, risk taxonomy, permission matrixCORE Shopilot Core: framework de gobernanza, taxonomía de riesgo, matriz de permisos
+MVP scope: MercadoLibre only, single marketplace integrationAlcance MVP: solo MercadoLibre, integracion de marketplace unico
+Architecture: Coach AI + ReAct loop + tool execution pipelineArquitectura: Coach AI + ReAct loop + pipeline de ejecucion de tools

ContentsContenido

Glossary — Key Terms Glosario — Terminos Clave 20 terms
ReAct Loop

Reasoning + Acting. The agent thinks step-by-step, decides which tool to use, observes the result, and repeats until the task is done. Core AI architecture pattern.Razonamiento + Accion. El agente piensa paso a paso, decide que herramienta usar, observa el resultado, y repite hasta completar la tarea. Patron core de arquitectura de IA.

tool_use

Anthropic API feature that lets the LLM call functions (tools) with structured parameters. How the AI agent executes marketplace operations.Feature de la API de Anthropic que permite al LLM llamar funciones (herramientas) con parametros estructurados. Como el agente de IA ejecuta operaciones de marketplace.

System Prompt

Hidden instructions given to the AI before every conversation. Defines identity, guardrails, user profile, and execution capabilities. Assembled from 3 layers (L1 base + L2 session + L3 execution) by ISystemPromptComposer (#4). Marketplace terminology lives in the KB, not here.Instrucciones ocultas dadas a la IA antes de cada conversacion. Define identidad, guardrails, perfil de usuario y capacidades de ejecucion. Ensamblado desde 3 capas (L1 base + L2 sesion + L3 ejecucion) por ISystemPromptComposer (#4). La terminologia de marketplace vive en la KB, no aqui.

Context Window

The total amount of text (in tokens) the LLM can process in a single interaction. Larger windows = more context, but higher cost. Shopilot targets ~200K tokens.La cantidad total de texto (en tokens) que el LLM puede procesar en una sola interaccion. Ventanas mas grandes = mas contexto, pero mayor costo. Shopilot apunta a ~200K tokens.

Prompt Caching

Anthropic optimization that caches the static part of the system prompt across calls, reducing cost by ~90% for repeated prefixes. Enables affordable always-on AI.Optimizacion de Anthropic que cachea la parte estatica del system prompt entre llamadas, reduciendo el costo ~90% para prefijos repetidos. Habilita IA siempre-activa a costo accesible.

WebContentsView

Electron API component that renders web content (like a browser tab). Shopilot uses it to display real marketplace websites (MeLi, Amazon, Shopify) inside the native app.Componente de la API de Electron que renderiza contenido web (como una pestana de navegador). Shopilot lo usa para mostrar sitios reales de marketplaces dentro de la app nativa.

Guardrails

Safety rules appended to every AI interaction: never execute without user confirmation, never fabricate data, never give financial/legal advice. Non-negotiable constraints.Reglas de seguridad agregadas a cada interaccion de IA: nunca ejecutar sin confirmacion del usuario, nunca fabricar datos, nunca dar consejos financieros/legales. Restricciones no negociables.

OAuth2

Industry-standard protocol for secure API access. Sellers authorize Shopilot to access their marketplace accounts without sharing passwords. Tokens rotate automatically.Protocolo estandar de la industria para acceso seguro a APIs. Los vendedores autorizan a Shopilot para acceder a sus cuentas de marketplace sin compartir contrasenas. Los tokens rotan automaticamente.

DAG

Directed Acyclic Graph. A pipeline of data processing steps that run in order. Shopilot uses DAGs (via Airflow) to sync marketplace data every 6 hours.Grafo Aciclico Dirigido. Un pipeline de pasos de procesamiento de datos que se ejecutan en orden. Shopilot usa DAGs (via Airflow) para sincronizar datos de marketplace cada 6 horas.

Vector Search / Embeddings

Finding information by meaning, not keywords. Text is converted to numerical vectors; similar meanings cluster together. Powers the Cerebro knowledge base.Buscar informacion por significado, no por palabras clave. El texto se convierte en vectores numericos; significados similares se agrupan. Potencia la base de conocimiento Cerebro.

RAG Pipeline

Retrieval-Augmented Generation. Before the AI responds, it retrieves relevant knowledge chunks from the database to ground its answer in real data, not hallucinations.Generacion Aumentada por Recuperacion. Antes de que la IA responda, recupera fragmentos de conocimiento relevantes de la base de datos para fundamentar su respuesta en datos reales, no alucinaciones.

WebSocket

Real-time communication channel between the desktop app and the server. Enables streaming AI responses (word by word) and live confirmation dialogs.Canal de comunicacion en tiempo real entre la app de escritorio y el servidor. Permite respuestas de IA en streaming (palabra por palabra) y dialogos de confirmacion en vivo.

Tools (Primitive Operations)

Tools (36) are primitive API operations (get_product, update_price, search_competitors) that the LLM composes dynamically at runtime via Anthropic tool_use. Managed by Tool Registry (#3) with IToolExecutor as single execution port. No separate "Skills" layer — the agent reasons directly over primitive tools + KB context.Tools (36) son operaciones primitivas de API (get_product, update_price, search_competitors) que el LLM compone dinamicamente en runtime via Anthropic tool_use. Gestionados por Tool Registry (#3) con IToolExecutor como unico puerto de ejecucion. Sin capa separada de "Skills" — el agente razona directamente sobre tools primitivas + contexto KB.

Credits

Internal currency for AI usage. Each skill costs credits based on complexity (1-25cr per interaction). Free plan: 50cr/mo. Pro plan: 500cr/mo + purchasable Credit Packs.Moneda interna para uso de IA. Cada skill cuesta creditos segun complejidad (1-25cr por interaccion). Plan Free: 50cr/mes. Plan Pro: 500cr/mes + Credit Packs comprables.

Medallion Architecture

Data lake pattern: Bronze (raw API responses) → Silver (cleaned, normalized) → Gold (aggregated, query-ready). Ensures data quality improves at each layer.Patron de data lake: Bronze (respuestas crudas de API) → Silver (limpiadas, normalizadas) → Gold (agregadas, listas para consulta). Asegura que la calidad de datos mejora en cada capa.

Rate Limiting

Controls how many API calls Shopilot makes per second to each marketplace. Prevents getting blocked by MeLi/Amazon/Shopify for excessive requests.Controla cuantas llamadas API hace Shopilot por segundo a cada marketplace. Previene ser bloqueado por MeLi/Amazon/Shopify por exceso de solicitudes.

IPC

Inter-Process Communication. How the Electron main process, renderer (UI), and preload scripts communicate securely via contextBridge.Comunicacion Entre Procesos. Como el proceso principal de Electron, el renderer (UI), y los preload scripts se comunican de forma segura via contextBridge.

TTL

Time-To-Live. How long cached data stays valid before being refreshed. Product cache: 5min. Data sync: 6h. Trace retention: 90 days.Tiempo de Vida. Cuanto tiempo los datos cacheados se mantienen validos antes de ser refrescados. Cache de productos: 5min. Data sync: 6h. Retencion de trazas: 90 dias.

Strategy Pattern

Software design pattern where each marketplace (MeLi, Amazon, Shopify) is a pluggable module implementing the same interface. Adding a marketplace = adding one class.Patron de diseno de software donde cada marketplace (MeLi, Amazon, Shopify) es un modulo pluggable que implementa la misma interfaz. Agregar un marketplace = agregar una clase.

Cloud Run

Google Cloud service that runs containerized backend services. Auto-scales from 0 to N instances based on traffic. Shopilot's FastAPI backend runs here.Servicio de Google Cloud que ejecuta servicios backend en contenedores. Auto-escala de 0 a N instancias segun el trafico. El backend FastAPI de Shopilot corre aqui.

1. What is Shopilot Que es Shopilot

Shopilot is a native desktop app that functions as a specialized browser for eCommerce. The seller navigates their marketplaces (Amazon, MercadoLibre, Shopify) as usual on the left side. On the right, Shopilot deploys as an intelligent sidebar — a conversational copilot with massive multi-marketplace context that doesn't just answer questions, but proactively proposes what to do next. Shopilot es una app nativa de escritorio que funciona como un navegador especializado en eCommerce. El vendedor navega sus marketplaces (Amazon, MercadoLibre, Shopify) de forma habitual en el lado izquierdo. Del lado derecho, Shopilot se despliega como un sidebar inteligente — un copilot conversacional con contexto masivo multi-marketplace que no solo responde preguntas, sino que propone proactivamente que hacer.

Shopilot.app
ML
Dashboard → MercadoLibre → Mis Publicaciones
$2,499
$1,899
$899
S
Shopilot
3 products need optimization 3 productos necesitan optimizacion
"Audit my top product" "Audita mi producto top"
Your title is 42 chars (rec: 60+). Competitor avg: 78. I can rewrite it. Proceed? Tu titulo tiene 42 chars (rec: 60+). Promedio competencia: 78. Puedo reescribirlo. Procedo?
Ask Shopilot...Pregunta a Shopilot...

ConversationalConversacional

Natural language interface. Ask anything about your business. Shopilot responds with data, not opinions.Interfaz en lenguaje natural. Pregunta lo que quieras sobre tu negocio. Shopilot responde con datos, no con opiniones.

ExecutorEjecutor

Doesn't just recommend — acts. Edits products, adjusts prices, runs campaigns. With confirmation for risky actions.No solo recomienda — actua. Edita productos, ajusta precios, corre campanias. Con confirmacion para acciones riesgosas.

ProactiveProactivo

Doesn't wait for questions. Detects opportunities, flags problems, proposes actions before you ask.No espera preguntas. Detecta oportunidades, senala problemas, propone acciones antes de que preguntes.

2. How the Big Players Do It Como lo Hacen los Grandes

Shopilot borrows architectural patterns from three products that redefined developer productivity. Each teaches a different lesson. Shopilot toma patrones arquitectonicos de tres productos que redefinieron la productividad de desarrolladores. Cada uno ensena una leccion diferente.

C

Cursor

VS Code fork + proprietary models + native shellFork de VS Code + modelos propios + shell nativa

What we learnQue aprendemos

Architecture: Electron app wrapping a full VS Code fork. Custom IPC via gRPC + Protocol Buffers. Chromium renderer with Monaco editor. Shadow windows for LSP access. Rust modules via NAPI-RS for indexing.Arquitectura: App Electron envolviendo un fork completo de VS Code. IPC custom via gRPC + Protocol Buffers. Renderer Chromium con editor Monaco. Shadow windows para acceso LSP. Modulos Rust via NAPI-RS para indexado.

Key tech: Sparse MoE models for Tab (custom, not GPT-4). Turbopuffer vector DB. Merkle tree sync every 10min. Speculative Edits (~1000 tok/s, 13x speedup).Tech clave: Modelos MoE sparse para Tab (custom, no GPT-4). Turbopuffer vector DB. Sync Merkle tree cada 10min. Speculative Edits (~1000 tok/s, 13x speedup).

Lessons for ShopilotLecciones para Shopilot

  • Native shell — Electron as browser container + sidebar. Same pattern: main content left, AI right.Shell nativa — Electron como contenedor de browser + sidebar. Mismo patron: contenido principal izquierda, IA derecha.
  • Context engine — Cursor auto-collects files, tabs, errors. Shopilot auto-collects marketplace, page, product, metrics.Motor de contexto — Cursor auto-recolecta archivos, tabs, errores. Shopilot auto-recolecta marketplace, pagina, producto, metricas.
  • Proactive suggestions — Tab completion is Cursor's killer feature. Shopilot's equivalent: proactive alerts and action proposals.Sugerencias proactivas — Tab completion es el killer feature de Cursor. El equivalente de Shopilot: alertas proactivas y propuestas de accion.
CC

Claude Code

Terminal CLI + agent loop + primitive tools + security layersCLI terminal + agent loop + tools primitivas + capas de seguridad

What we learnQue aprendemos

Architecture: TypeScript/Node.js CLI. Ink (React for terminals). Async generator agent loop. 18 primitive tools. Exact string matching for edits. Subagent system (up to 10 parallel).Arquitectura: TypeScript/Node.js CLI. Ink (React para terminales). Agent loop con async generator. 18 tools primitivas. Matching exacto de strings para edits. Sistema de subagentes (hasta 10 en paralelo).

Key tech: 6-layer security with OS sandboxing. Prompt caching (92%+ hit rate). 3-layer memory. Tool result context guard (token budget per tool).Tech clave: 6 capas de seguridad con sandboxing OS. Prompt caching (92%+ hit rate). 3 capas de memoria. Context guard para resultados de tools (budget de tokens por tool).

Lessons for ShopilotLecciones para Shopilot

  • ReAct agent loop — plan → execute tools → observe → decide. The heart of Shopilot's orchestration.ReAct agent loop — planear → ejecutar tools → observar → decidir. El corazon de la orquestacion de Shopilot.
  • Primitive tools — Small, composable tools. Skills combine them into complex workflows.Tools primitivas — Herramientas pequenas y componibles. Los skills las combinan en flujos complejos.
  • Risk taxonomy — 6-layer security maps to our read-only / reversible / irreversible model. Real money = real security.Taxonomía de riesgo — Las 6 capas de seguridad se mapean a nuestro modelo solo-lectura / reversible / irreversible. Dinero real = seguridad real.
OC

OpenClaw

Multi-agent framework + skills system + tool policies + plugin hooksFramework multi-agente + sistema de skills + politicas de tools + hooks de plugins

What we learnQue aprendemos

Architecture: WebSocket gateway + agent routing. Tool factory with policy filtering per agent/channel. Stream function wrapping (decorator chain). Session write locks. Plugin hook lifecycle (before/after tool calls).Arquitectura: Gateway WebSocket + routing de agentes. Factory de tools con filtrado por politicas por agente/canal. Wrapping de funciones de stream (cadena de decoradores). Write locks de sesion. Lifecycle de hooks de plugins (before/after tool calls).

Lessons for ShopilotLecciones para Shopilot

  • Tool Registry — OpenClaw-inspired pattern: register, filter by context, execute via single port (IToolExecutor). 36 primitive tools composed by LLM at runtime.Tool Registry — Patron inspirado en OpenClaw: registrar, filtrar por contexto, ejecutar via puerto unico (IToolExecutor). 36 tools primitivas compuestas por LLM en runtime.
  • Tool policy — Filter tools by marketplace, plan, risk level. Not all users see all tools.Politica de tools — Filtrar tools por marketplace, plan, nivel de riesgo. No todos los usuarios ven todas las tools.
  • Hook lifecycle — before_tool → execute → after_tool. Perfect for the feedback loop: capture what ran, what resulted.Lifecycle de hooks — before_tool → ejecutar → after_tool. Perfecto para el feedback loop: capturar que se ejecuto, que resulto.

3. What We Already Have Lo Que Ya Tenemos

Shopilot doesn't start from zero. Three production systems and one live product provide the foundation. Shopilot no arranca de cero. Tres sistemas en produccion y un producto live proveen la base.

PRODUCTION

Core Intelligence Conversation API (The Coach)

Conversational AI specialized in eCommerce. Clean Architecture + DDD. 12-step RAG pipeline. Multi-LLM support (Claude, OpenAI). Brand Health intent detection without spending tokens. Full observability with ConversationTrace. Cost tracking per execution per client. IA conversacional especializada en eCommerce. Clean Architecture + DDD. Pipeline RAG de 12 pasos. Soporte multi-LLM (Claude, OpenAI). Deteccion de intencion Brand Health sin gastar tokens. Observabilidad completa con ConversationTrace. Tracking de costo por ejecucion por cliente.

ChannelAdapter
Multi-channel inputInput multi-canal
RAG Pipeline
12-step orchestrationOrquestacion 12 pasos
ILLMClient
Multi-provider LLMLLM multi-proveedor
DynamoDB
Conversations + stateConversaciones + estado
PRODUCTION

Core Intelligence Knowledge Base (The Brain)

Structured knowledge base. Markdown + YAML front-matter in Git. 4-stage indexing pipeline (validate → chunk → embed → BigQuery). Semantic search over embeddings. Namespaces: pricing, ads, inventory, financial, organic, quality, compliance, reputation, returns, health, learning. Automated outdated document reporting. Base de conocimiento estructurada. Markdown + YAML front-matter en Git. Pipeline de indexacion de 4 etapas (validar → chunk → embed → BigQuery). Busqueda semantica sobre embeddings. Namespaces: pricing, ads, inventario, financiero, organico, calidad, compliance, reputacion, devoluciones, salud, aprendizaje. Reporte automatico de documentos desactualizados.

PRODUCTION

Multi-Marketplace Data OrchestratorOrquestador de Datos Multi-Marketplace

Apache Airflow orchestrating data ingestion from Shopify (GraphQL) and MercadoLibre (REST). Modular DAGs (Auth → Extractor → Transformer → Dispatcher). Medallion architecture on GCS (Parquet + Snappy). FastAPI Data API on Cloud Run. OAuth2 automatic token rotation. OpenMetadata for data governance. Terraform IaC. Apache Airflow orquestando ingesta de datos de Shopify (GraphQL) y MercadoLibre (REST). DAGs modulares (Auth → Extractor → Transformer → Dispatcher). Arquitectura Medallion en GCS (Parquet + Snappy). Data API FastAPI en Cloud Run. Rotacion automatica de tokens OAuth2. OpenMetadata para gobernanza de datos. Terraform IaC.

Shopify
GraphQL connector
MercadoLibre
REST connector
Data Lake
GCS + Parquet + Hive
Data API
FastAPI + Cloud Run
LIVE — 200 USERS

Sellerfy v1

Bootstrapped SaaS with 200 paying users. Python (FastAPI) + React/Next.js + PostgreSQL/DynamoDB + AWS. Stripe billing integration. These 200 users validate market demand and become day-1 beta testers for Shopilot. SaaS bootstrapped con 200 usuarios pagando. Python (FastAPI) + React/Next.js + PostgreSQL/DynamoDB + AWS. Integracion de billing con Stripe. Estos 200 usuarios validan la demanda del mercado y se convierten en beta testers dia 1 de Shopilot.

DESIGNED — NOT CODED

Coach Evolution PlansPlanes de Evolucion del Coach

ReAct Loop
Iterative reasoning + tool execution. 5-layer design with confirmation flows.Razonamiento iterativo + ejecucion de tools. Diseno de 5 capas con flujos de confirmacion.
Action CatalogCatalogo de Acciones
Risk taxonomy: read-only, reversible, irreversible. Pre-built for marketplace safety.Taxonomía de riesgo: solo-lectura, reversible, irreversible. Pre-disenado para seguridad en marketplaces.
Product ToolsTools de Producto
Competitive intelligence (Rainforest), product audit, text generation, image generation. Code-ready specs.Inteligencia competitiva (Rainforest), auditoria de producto, generacion de texto, generacion de imagen. Specs listos para codificar.

4. What We Reuse & Why Que Reutilizamos y Por Que

REUSEREUTILIZAR

#8 Observability

ConversationTrace + AgentTracking operational. Traces in PostgreSQL, cost per execution calculated by trigger, credits deducted automatically. Extend, don't rebuild.ConversationTrace + AgentTracking operacionales. Trazas en PostgreSQL, costo por ejecución calculado por trigger, créditos descontados automáticamente. Extender, no reconstruir.

REUSEREUTILIZAR

#9 Cerebro KB

2,875 documents indexed, Go + Vertex AI 004 + BigQuery vectors pipeline. Semantic RAG active in production. Add updated Amazon + MeLi namespaces and marketplace trends. The indexing pipeline transfers as-is.2.875 documentos indexados, pipeline Go + Vertex AI 004 + vectores BigQuery. RAG semántico activo en producción. Agregar namespaces Amazon + MeLi actualizados y tendencias de marketplace. El pipeline de indexación se transfiere tal cual.

REUSEREUTILIZAR

ILLMClient + Factory

Already supports Claude, OpenAI, OpenRouter. Factory selects provider per agent/config at runtime. Works as-is — lives inside #2, not a separate project.Ya soporta Claude, OpenAI, OpenRouter. Factory selecciona proveedor por agente y config en runtime. Funciona tal cual — vive dentro de #2, no es un proyecto separado.

ADAPTADAPTAR

#10 Data Sync

Batch Airflow + GCS pipeline persists for daily seller data ingestion. Add TypeScript adapters for real-time sync via EventBridge. Brand Health Index operational. Existing DAGs unchanged.Pipeline batch Airflow + GCS persiste para ingesta diaria de datos del vendedor. Añadir adapters TypeScript para sincronización real-time vía EventBridge. Brand Health Index operacional. Los DAGs existentes no se modifican.

ADAPTADAPTAR

#2 Orchestrator

From one-shot 12-step pipeline to autonomous ReAct agent. Same Lambda, same repo — the existing flow is extended, not replaced. Absorbs system prompt composition via ISystemPromptComposer.De pipeline one-shot de 12 pasos a agente autónomo ReAct. Mismo Lambda, mismo repo — el flujo existente se extiende, no se reemplaza. Absorbe la composición del system prompt vía ISystemPromptComposer.

ADAPTADAPTAR

#5 Context Aggregator

Context assembly logic already exists in RagOrchestrator. Refactor into IContextAssembler (KB + Brand Health RAG) + IContextWindowManager (dynamic token budget over 200K context window). Extract and formalize, don't rebuild.La lógica de ensamblado de contexto ya existe en RagOrchestrator. Refactorizar en IContextAssembler (KB + Brand Health RAG) + IContextWindowManager (presupuesto dinámico de tokens sobre 200K context window). Extraer y formalizar, no reconstruir.

ADAPTADAPTAR

#14 DevOps

CDK TypeScript (AWS) exists and deploys conversation-api today. Extend with unified multi-repo stack + Terraform for GCP (BigQuery, Vertex AI). Base infrastructure unchanged.CDK TypeScript (AWS) existe y despliega conversation-api hoy. Extender con stack unificado multi-repo + Terraform para GCP (BigQuery, Vertex AI). La infraestructura base no cambia.

NEWNUEVO

#12 Marketplace Provider

core-action-marketplace-provider. Brand new service. TypeScript with IMarketplaceAdapter (MeLi REST + Amazon SP-API) + ITokenManager OAuth2 (absorbs #10 Auth Vault). Executes real marketplace actions: product mutations, price, stock, buyer communication, campaigns. The WRITE tools backend for the Coach.core-action-marketplace-provider. Servicio nuevo desde cero. TypeScript con IMarketplaceAdapter (MeLi REST + Amazon SP-API) + ITokenManager OAuth2 (absorbe #10 Auth Vault). Ejecuta acciones reales en el marketplace: mutaciones de producto, precio, stock, comunicación con compradores, campañas. Es el backend de las tools WRITE del Coach.

NEWNUEVO

#13 Billing (absorbs #14)#13 Billing (absorbe #14)

core-platform-billing. PostgreSQL credit triggers and clients schema are the only starting point (schema, not code). Everything else — ICreditsGate, HttpCreditGate, POST /internal/gate, Stripe Checkout, webhooks, Free/Pro state machine, Credit Packs — built from scratch. Absorbs #14.core-platform-billing. Los triggers PostgreSQL de créditos y el schema clients son el único punto de partida (schema, no código). Todo lo demás — ICreditsGate, HttpCreditGate, POST /internal/gate, Stripe Checkout, webhooks, máquina de estados Free/Pro, Credit Packs — se construye desde cero. Absorbe #14.

NEWNUEVO

#15 Feedback Loop

core-quality-feedback. Completely new repo and service. IFeedbackService + before/after impact measurement for WRITE actions (visits, sales, conversion) + 3 feedback sources. No reusable code from previous design.core-quality-feedback. Repo y servicio completamente nuevos. IFeedbackService + medición de impacto antes/después de acciones WRITE (visitas, ventas, conversión) + 3 fuentes de feedback. Sin código reutilizable del diseño anterior.

NEWNUEVO

#3 Tool Registry

36 primitive tools (READ + ANALYSIS + WRITE + SYSTEM) via Anthropic tool_use. ToolPolicyFilter (plan + marketplace + risk level) + HookLifecycle (before_tool → execute → after_tool). Not a fixed catalog — the autonomous agent reasons which tool to use.36 tools primitivas (READ + ANALYSIS + WRITE + SYSTEM) vía tool_use de Anthropic. ToolPolicyFilter (plan + marketplace + nivel de riesgo) + HookLifecycle (before_tool → execute → after_tool). No es un catálogo fijo — el agente autónomo razona qué herramienta usar.

NEWNUEVO

#4 Personality Engine

ISystemPromptComposer. 3 layers: L1 base identity (~1,200 tokens, cached between sessions), L2 session (UserProfile + critical alerts, ~400 tokens), L3 WRITE guardrails (~200 tokens, active only when WRITE tools are in play). Hard cap 1,200 tokens total. ~750-950 tokens in typical use.ISystemPromptComposer. 3 capas: L1 identidad base (~1.200 tokens, cached entre sesiones), L2 sesión (UserProfile + alertas críticas, ~400 tokens), L3 guardrails WRITE (~200 tokens, solo activa cuando hay tools WRITE en juego). Hard cap 1.200 tokens totales. ~750-950 tokens en uso típico.

NEWNUEVO

#6 Proactive Suggestions

IProactiveSuggestionService. Lightweight LLM inference in the after_tool hook of HookLifecycle. Structured output: { hasSuggestion, message, suggestionType, priority, productId }. Max 2 suggestions per turn, question tone. Pro only. Cross-session deduplication via UserProfile (7-day window per suggestion type + product).IProactiveSuggestionService. Inferencia LLM ligera en el after_tool hook del HookLifecycle. Output estructurado: { hasSuggestion, message, suggestionType, priority, productId }. Máximo 2 sugerencias por turno, tono de pregunta. Solo Pro. Deduplicación cross-session via UserProfile (ventana de 7 días por tipo de sugerencia + producto).

NEWNUEVO

#1 Native Shell

No existing code. Electron + WebContentsView for marketplace navigation + React sidebar with 5 views (Chat, Profile, Billing, Enrollment, Onboarding). Bidirectional WebSocket protocol with the Orchestrator. The biggest new piece of the stack.No hay código existente. Electron + WebContentsView para navegación en marketplace + sidebar React con 5 vistas (Chat, Perfil, Billing, Enrollment, Onboarding). Protocolo WebSocket bidireccional con el Orchestrator. La pieza nueva más grande del stack.

NEWNUEVO

#16 Eval Framework

core-quality-stack-evaluation. 7 pipelines: LLM Judge (response quality), inter-project contracts, KB quality, E2E, desktop_build, figma_quality, api_monitor. Quality gate in CI/CD — blocks deploys that degrade quality. Does not run in product runtime.core-quality-stack-evaluation. 7 pipelines: LLM Judge (calidad de respuestas), contratos entre proyectos, calidad KB, E2E, desktop_build, figma_quality, api_monitor. Quality gate en CI/CD — bloquea deploys que degradan la calidad. No corre en runtime del producto.

NEWNUEVO

#7 Guardrails

InputGuard (pre-LLM: prompt injection detection + off-scope requests) + OutputGuard (post-LLM: data leak prevention + dangerous content). Independent layer from Orchestrator — can evolve without touching the ReAct loop.InputGuard (pre-LLM: detección de prompt injection + requests off-scope) + OutputGuard (post-LLM: prevención de data leak + contenido peligroso). Capa independiente del Orchestrator — puede evolucionar sin tocar el loop ReAct.

NEWNUEVO

#11 Enrichment

core-knowledge-enrichment. Market Intelligence (competitors, pricing, keywords, fee estimation) + Content Analysis (technical image and video analysis for marketplace). New service — no base code.core-knowledge-enrichment. Market Intelligence (competidores, precios, keywords, estimación de fees) + Content Analysis (análisis técnico de imagen y video para marketplace). Nuevo servicio — no hay código de base.

NEWNUEVO

#17 Beautonomous

core-internal-team-workflow. Internal operational agent. OpenClaw UI (main interface) + Slack (notifications & approvals) + quality base structure per repo. Includes: bootstrap templates (CLAUDE.md, .claudeignore, settings.json, MEMORY.md, specs/, skills/), structure verification shell script (step 0 quality gate), quality-gate.yml (GitHub Action + Claude Code API script), OpenClaw system prompt with 3 governance roles, and quality agent prompt. Not just config — has quality infrastructure code written once and replicated across 11 repos.core-internal-team-workflow. Agente operativo interno del equipo. OpenClaw UI (interfaz principal) + Slack (notificaciones y aprobaciones) + estructura base de calidad en cada repo del stack. Incluye: templates de bootstrap (CLAUDE.md, .claudeignore, settings.json, MEMORY.md, specs/, skills/), shell script de verificación de estructura (paso 0 del quality gate), quality-gate.yml (GitHub Action + script Claude Code API), system prompt de OpenClaw con 3 roles de gobernanza, y prompt del agente de calidad. No es solo configuración — tiene código de infraestructura de calidad que se escribe una vez y se replica en los 11 repos.

Eliminated Projects (absorbed)Proyectos Eliminados (absorbidos)

DEL
#10 Auth Vault → #12 Marketplace Provider (ITokenManager)

OAuth token management is the responsibility of the same service that executes actionsLa gestión de tokens OAuth es responsabilidad del mismo servicio que ejecuta las acciones

DEL
#14 Billing → #13 core-platform-billing

Billing and Token Economy had circular dependencies — unified, single source of truthBilling y Token Economy tenían dependencias circulares — unificados, la fuente de verdad es una sola

SummaryResumen

2

REUSEREUTILIZAR

#8, #9

4

ADAPTADAPTAR

#10, #2, #5, #14

11

NEWNUEVO

#17, #12, #13, #3, #15, #4, #6, #1, #16, #7, #11

19 active projects total19 proyectos activos en total

5. Architecture Arquitectura

7-layer stack — 19 active projects mapped to a production-ready architecture. Stack de 7 capas — 19 proyectos activos mapeados a una arquitectura lista para producción.

LAYER 1 — PRODUCT CAPA 1 — PRODUCTO

What the seller installs and sees.Lo que el vendedor instala y ve.

NEW #1 Native Shell (core-product-desktop-client) — Electron + WebContentsView for marketplace nav + React sidebar with 5 views (Chat, Profile, Billing, Enrollment, Onboarding). Bidirectional WebSocket with the Orchestrator. The biggest new piece of the stack. #1 Shell Nativo (core-product-desktop-client) — Electron + WebContentsView para navegación en marketplace + sidebar React con 5 vistas (Chat, Perfil, Billing, Enrollment, Onboarding). WebSocket bidireccional con el Orchestrator. La pieza nueva más grande del stack.
NEW #18 Design System (core-product-design-system) — specs and context repository (no executable code) bridging Figma design with React implementation. 44 components (Atomic Design), 9 brand decisions (D1-D9), design tokens (3 Figma variable collections: Primitives → Semantic → Component), UX Writing guide, 8 data viz patterns, AI-native interaction patterns. Claude reads Figma via MCP to implement components in #1. UX/UI team (executes Figma T0.BB–T4.BB) + Pablo (approves) + Sergio (consumes → React Mockups). #18 Design System (core-product-design-system) — repositorio de specs y contexto (sin código ejecutable) que conecta diseño Figma con implementación React. 44 componentes (Atomic Design), 9 decisiones de marca (D1-D9), design tokens (3 colecciones de variables Figma: Primitives → Semantic → Component), guía de UX Writing, 8 patrones de data viz, patrones de interacción AI-native. Claude lee Figma via MCP para implementar componentes en #1. Equipo UX/UI (ejecuta Figma T0.BB–T4.BB) + Pablo (aprueba) + Sergio (consume → React Mockups).

LAYER 2 — INTELLIGENCE CAPA 2 — INTELIGENCIA

The Coach lives here. Single repo (core-intelligence-conversation-api) deployed as Lambda Node.js 18 TypeScript behind AWS API Gateway v2. Middleware: Memberstack JWT auth + Zod validation + rate limiting.El Coach vive aquí. Un solo repo (core-intelligence-conversation-api) desplegado como Lambda Node.js 18 TypeScript detrás de AWS API Gateway v2. Middleware: Memberstack JWT auth + Zod validation + rate limiting.

ADAPT #2 Orchestrator ReAct — from one-shot 12-step pipeline to autonomous ReAct agent. Same Lambda, same repo. 3 layers, 7 ports. ILLMClient + Factory (Claude, OpenAI, OpenRouter): reuse as-is. #2 Orchestrator ReAct — de pipeline one-shot de 12 pasos a agente autónomo con loop ReAct. Mismo Lambda, mismo repo. 3 capas, 7 puertos. ILLMClient + Factory (Claude, OpenAI, OpenRouter): reutilizar tal cual.
NEW #3 Tool Registry — 36 primitive tools via Anthropic tool_use: READ + ANALYSIS + WRITE + SYSTEM. ToolPolicyFilter (plan + marketplace + risk level) + HookLifecycle (before_tool → execute → after_tool). The agent reasons which tool to use — not a fixed catalog. #3 Tool Registry — 36 tools primitivas vía tool_use de Anthropic: READ + ANALYSIS + WRITE + SYSTEM. ToolPolicyFilter (plan + marketplace + nivel de riesgo) + HookLifecycle (before_tool → execute → after_tool). El agente razona qué herramienta usar — no es un catálogo fijo.
NEW #4 Personality Engine — ISystemPromptComposer. 3 layers: L1 base identity (~1,200 tokens, cached with Anthropic prompt caching), L2 session (UserProfile + critical alerts, ~400 tokens), L3 WRITE guardrails (~200 tokens, conditional). ~750-950 typical. Hard cap 1,200 tokens. #4 Personality Engine — ISystemPromptComposer. 3 capas: L1 identidad base (~1.200 tokens, cached con Anthropic prompt caching), L2 sesión (UserProfile + alertas críticas, ~400 tokens), L3 guardrails WRITE (~200 tokens, condicional). ~750-950 típico. Hard cap 1.200 tokens.
ADAPT #5 Context Aggregator — IContextAssembler (KB + Brand Health RAG) + IContextWindowManager (dynamic budget over 200K context window). Logic exists in RagOrchestrator — refactor and formalize, don't rebuild. #5 Context Aggregator — IContextAssembler (KB + Brand Health RAG) + IContextWindowManager (presupuesto dinámico sobre 200K context window). La lógica existe en RagOrchestrator — refactorizar y formalizar, no reconstruir.
NEW #6 Proactive Suggestions — IProactiveSuggestionService. Lightweight LLM inference in after_tool hook. Structured output: hasSuggestion, message, suggestionType, priority, productId. Max 2 per turn. Pro only. Cross-session dedup via UserProfile (7-day window). #6 Sugerencias Proactivas — IProactiveSuggestionService. Inferencia LLM ligera en el after_tool hook. Output estructurado: hasSuggestion, message, suggestionType, priority, productId. Máximo 2 por turno. Solo Pro. Deduplicación cross-session via UserProfile (ventana 7 días).
NEW #7 Guardrails — InputGuard (pre-LLM: prompt injection + off-scope) + OutputGuard (post-LLM: data leak + dangerous content). Independent layer from ReAct loop — evolves without touching the Orchestrator. #7 Guardrails — InputGuard (pre-LLM: prompt injection + off-scope) + OutputGuard (post-LLM: data leak + contenido peligroso). Capa independiente del loop ReAct — evoluciona sin tocar el Orchestrator.
REUSE #8 Observability — PostgreSQL traces + cost-per-execution triggers + automatic credit deduction. Operational. Extend to expose real traces to the Eval Framework. #8 Observability — trazas PostgreSQL + triggers de costo por ejecución + créditos descontados automáticamente. Operacional. Extender para exponer trazas reales al Eval Framework.

LAYER 3 — KNOWLEDGE CAPA 3 — CONOCIMIENTO

What the Coach knows.Lo que el Coach sabe.

REUSE #9 Cerebro KB (core-knowledge-semantic-base) — 2,875 docs (Markdown + Git) + Go pipeline + Vertex AI 004 + BigQuery vectors. Semantic RAG active. Add updated Amazon + MeLi namespaces and marketplace trends. Pipeline transfers as-is. #9 Cerebro KB (core-knowledge-semantic-base) — 2.875 docs (Markdown + Git) + pipeline Go + Vertex AI 004 + vectores BigQuery. RAG semántico activo. Agregar namespaces Amazon + MeLi actualizados y tendencias de marketplace. Pipeline se transfiere tal cual.
ADAPT #10 Data Sync (core-knowledge-data-synchronizator) — batch Airflow + GCS pipeline persists for daily ingestion. Add TypeScript adapters for real-time sync via EventBridge. Brand Health Index operational. Existing DAGs unchanged. #10 Data Sync (core-knowledge-data-synchronizator) — pipeline batch Airflow + GCS persiste para ingesta diaria. Añadir adapters TypeScript real-time vía EventBridge. Brand Health Index operacional. Los DAGs existentes no se modifican.
NEW #11 Enrichment (core-knowledge-enrichment) — Market Intelligence (competitors, pricing, keywords, category fee estimation) + Content Analysis (technical image & video analysis for marketplace). New service — no base code. #11 Enrichment (core-knowledge-enrichment) — Market Intelligence (competidores, precios, keywords, estimación de fees por categoría) + Content Analysis (análisis técnico de imágenes y video para marketplace). Nuevo servicio — no hay código de base.

LAYER 4 — ACTION CAPA 4 — ACCIÓN

What the Coach can do in the marketplace.Lo que el Coach puede hacer en el marketplace.

NEW #12 Marketplace Provider (core-action-marketplace-provider) — brand new service. TypeScript + IMarketplaceAdapter (MeLi REST + Amazon SP-API) + ITokenManager OAuth2 (absorbs #10). Executes real marketplace actions: product mutations, price, stock, buyer communication, campaigns. WRITE tools backend for the Coach. #12 Marketplace Provider (core-action-marketplace-provider) — servicio nuevo desde cero. TypeScript + IMarketplaceAdapter (MeLi REST + Amazon SP-API) + ITokenManager OAuth2 (absorbe #10). Ejecuta acciones reales en el marketplace: mutaciones de producto, precio, stock, comunicación con compradores, campañas. Backend de las tools WRITE del Coach.
ELIMINATED #10 Auth Vault — absorbed into #12 as ITokenManager (DynamoDB + AWS Secrets Manager + AES-256-GCM) #10 Auth Vault — absorbido en #12 como ITokenManager (DynamoDB + AWS Secrets Manager + AES-256-GCM)

LAYER 5 — PLATFORM CAPA 5 — PLATAFORMA

What sustains the business and infrastructure.Lo que sostiene el negocio e infraestructura.

NEW #13 Billing (core-platform-billing, absorbs #14) — new service from scratch. PostgreSQL schema + credit triggers extended to separate plan vs packs. ICreditsGate + HttpCreditGate + POST /internal/gate + Stripe Checkout + idempotent webhooks + Free/Pro state machine + Credit Packs + LLM prompt caching. #13 Billing (core-platform-billing, absorbe #14) — nuevo servicio desde cero. Schema PostgreSQL + triggers de créditos extendidos para separar plan vs packs. ICreditsGate + HttpCreditGate + POST /internal/gate + Stripe Checkout + webhooks idempotentes + máquina de estados Free/Pro + Credit Packs + prompt caching LLM.
ADAPT #14 DevOps (core-platform-infrastructure) — CDK TypeScript (AWS) exists and deploys conversation-api today. Extend with unified multi-repo stack + Terraform (GCP) for BigQuery and Vertex AI. 3 environments: dev / staging / prod. #14 DevOps (core-platform-infrastructure) — CDK TypeScript (AWS) existe y despliega conversation-api hoy. Extender con stack unificado multi-repo + Terraform (GCP) para BigQuery y Vertex AI. 3 ambientes: dev / staging / prod.
NEW #19 Go to Market & Analytics (core-platform-gtm-analytics) — go-to-market strategy (positioning, channels, early adopter acquisition, onboarding funnels) + analytics infrastructure for product usage, retention, conversion, and growth metrics. Owned by Pablo + external GTM team — no internal engineering tasks in MVP sprints. #19 Go to Market & Analytics (core-platform-gtm-analytics) — estrategia de salida al mercado (posicionamiento, canales, adquisición de early adopters, funnels de onboarding) + infraestructura de analytics para uso del producto, retención, conversión y métricas de crecimiento. A cargo de Pablo + equipo externo de GTM — sin tareas de ingeniería interna en los sprints del MVP.
ELIMINATED #14 Billing — merged into #13 core-platform-billing #14 Billing — fusionado en #13 core-platform-billing

LAYER 6 — QUALITY CAPA 6 — CALIDAD

What measures if the Coach works well and learns from its actions.Lo que mide si el Coach funciona bien y aprende de sus acciones.

NEW #15 Feedback Loop (core-quality-feedback) — new service from scratch. IFeedbackService + WRITE action impact measurement (before/after metrics at 7 days: visits, sales, conversion) + 3 feedback sources. Separate repo. #15 Feedback Loop (core-quality-feedback) — nuevo servicio desde cero. IFeedbackService + medición de impacto de acciones WRITE (métricas antes/después a 7 días: visitas, ventas, conversión) + 3 fuentes de feedback. Repo separado.
NEW #16 Eval Suite (core-quality-stack-evaluation) — 7 pipelines: LLM Judge (response quality), inter-project contracts, KB quality, E2E, desktop builds (macOS+Windows, 11 checks), Figma quality (15 checks via REST API), API monitor. Quality gate in CI/CD — blocks deploys that degrade quality, validates builds are distributable, and audits Figma for MCP compatibility. Does not run in product runtime. #16 Eval Suite (core-quality-stack-evaluation) — 7 pipelines: LLM Judge (calidad de respuestas), contratos entre proyectos, calidad KB, E2E, builds de escritorio (macOS+Windows, 11 checks), calidad Figma (15 checks via API REST), monitor de APIs. Quality gate en CI/CD — bloquea deploys que degradan la calidad, valida que los builds son distribuibles, y audita el Figma para compatibilidad MCP. No corre en runtime del producto.

LAYER 7 — INTERNAL CAPA 7 — INTERNO

How the team works.Cómo trabaja el equipo.

NEW #17 Beautonomous (core-internal-team-workflow) — internal operational agent. OpenClaw UI + Slack + quality base structure per repo. Bootstrap templates, quality gate shell script (step 0), quality-gate.yml (GitHub Action + Claude Code API), system prompt with 3 governance roles, quality agent prompt. Not just config — has quality infrastructure code. #17 Beautonomous (core-internal-team-workflow) — agente operativo interno del equipo. OpenClaw UI + Slack + estructura base de calidad por repo. Templates de bootstrap, shell script del quality gate (paso 0), quality-gate.yml (GitHub Action + Claude Code API), system prompt con 3 roles de gobernanza, prompt del agente de calidad. No es solo configuración — tiene código de infraestructura de calidad.
REUSE 2 projects2 proyectos
ADAPT 4 projects4 proyectos
NEW 13 projects13 proyectos

6. Project Implementation Map Mapa de Implementacion de Proyectos

Cross-cutting view of the 19 active projects — how they connect, what depends on what, and which repositories group them. Vista transversal de los 19 proyectos activos — como se conectan, que depende de que, y que repositorios los agrupan.

Project FamiliesFamilias de Proyectos

# FamilyFamilia ProjectsProyectos
1 What the user installsLo que el usuario instala #1 Native Shell, #18 Design System
2 The Coach itselfEl Coach en si #2, #3, #4, #5, #6, #7, #8 (conversation-api)
3 What the Coach knowsLo que el Coach sabe #9 KB, #10 Data Sync, #11 Enrichment
4 What the Coach can doLo que el Coach puede hacer #12 Marketplace Provider
5 How it gets paidComo se paga #13 Billing & Credit Economy, #19 GTM & Analytics
6 Where it runsDonde corre #14 DevOps
7 Learning & qualityAprendizaje y calidad #15 Feedback Loop, #16 Eval Suite
8 How we workComo trabajamos #17 Beautonomous

Repository OrganizationOrganizacion por Repositorio

RepositoryRepositorio ProjectsProyectos
core-intelligence-conversation-api #8, #2, #3, #4, #5, #6, #7
core-knowledge-semantic-base #9
core-knowledge-data-synchronizator #10
core-action-marketplace-provider #12
core-knowledge-enrichment #11
core-quality-feedback #15
core-product-desktop-client #1
core-platform-billing #13
core-platform-infrastructure #14
core-quality-stack-evaluation #16
core-internal-team-workflow #17
shopilot-design-system #18
core-platform-gtm-analytics #19

Connection DiagramDiagrama de Conexiones

    USUARIO
      |
      v
  [#1 Native Shell]  (Electron app)  ← [#18 Design System] (tokens + components)
      |
      v
  [#7 Guardrails]  InputGuard (pre-LLM)
      |
      v
  [#2 Orchestrator] + [#4 Personality] + [#5 Context Agg]
      |
      +---> [#3 Tool Registry]
      |         |
      |         +---> [#12 Marketplace Provider] (includes TokenManager, absorbs #10)
      |         +---> [#11 Enrichment Layer]
      |         +---> [#10 Data Sync]
      |         +---> [#6 Proactive Suggestions]
      |
      +---> [#9 Cerebro KB]
      |
      v
  [#7 Guardrails]  OutputGuard (post-LLM)
      |
      v
  RESPUESTA AL USUARIO

  --- Platform ---
  [#13 Billing & Credit Economy]          (metering + payments)
  [#14 DevOps]                          (infrastructure)
  [#8 Observability]                    (tracing + logging)
  [#15 Feedback Loop]                    (impact measurement)
  [#16 Eval Suite]                      (CI/CD quality gate)
  [#17 Beautonomous]                     (team operations)
  [#19 GTM & Analytics]                  (launch strategy + usage tracking)
                

Who Is Responsible for WhatQuien es Responsable de Que

CapabilityCapacidad Project(s)Proyecto(s)
Render marketplace pageRenderizar pagina de marketplace #1 Native Shell
Visual design system & brand tokensSistema de diseño visual y tokens de marca #18 Design System
Reason & decide next actionRazonar y decidir siguiente accion #2 Orchestrator
Execute marketplace writesEjecutar escrituras en marketplace #12 Marketplace Provider
Fetch external market dataObtener datos externos de mercado #11 Enrichment Layer
Analyze images & videoAnalizar imagenes y video #11 Enrichment Layer
Sync seller dataSincronizar datos del vendedor #10 Data Sync
Inject relevant contextInyectar contexto relevante #5 Context Aggregator + #9 Cerebro KB
Compose system promptComponer system prompt #4 Personality Engine
Route & enforce tool policiesEnrutar y aplicar politicas de tools #3 Tool Registry
Validate input & output safetyValidar seguridad de entrada y salida #7 Guardrails
Track usage & billingRastrear uso y facturacion #13 Billing & Credit Economy
Measure quality (CI/CD)Medir calidad (CI/CD) #16 Eval Suite
Collect feedback & learnRecolectar feedback y aprender #15 Feedback Loop
Go-to-market strategy & analyticsEstrategia de salida al mercado & analytics #19 Go to Market & Analytics

Critical DependenciesDependencias Criticas

#10 Data Sync #3 Tool Registry (seller data available as tool results)(datos del vendedor disponibles como resultados de tools)
#12 Marketplace Provider #3 Tool Registry (write operations exposed as tools)(operaciones de escritura expuestas como tools)
#11 Enrichment #3 Tool Registry (ANALYSIS tools routed via IEnrichmentService)(ANALYSIS tools ruteadas via IEnrichmentService)
#9 Cerebro KB #5 Context Aggregator (knowledge injected as context)(conocimiento inyectado como contexto)
#10 Auth Vault absorbed into #12 — ITokenManager is now internal to Marketplace Provider
#18 Design System #1 Native Shell (tokens + components consumed by desktop client)(tokens + componentes consumidos por el cliente de escritorio)
#2 Orchestrator all (central loop coordinates everything)todos (loop central coordina todo)

7. Beautonomous — Internal Operations Agent Beautonomous — Agente Operativo Interno

core-internal-team-workflow (#17). OpenClaw UI as primary interface + Slack for notifications and approvals + quality base structure in every repository. 4 engineers operating like 10–15. core-internal-team-workflow (#17). OpenClaw UI como interfaz principal + Slack para notificaciones y aprobaciones + estructura base de calidad en cada repositorio. 4 ingenieros operando como 10–15.

7.1 — What It SolvesQué Resuelve

The problem isn’t technical capacity — it’s operational fragmentation: to know what’s happening you need to check Linear, GitHub and Slack separately; simple changes require interrupting someone; there’s no centralized place to approve changes or trigger reviews. El problema no es la capacidad técnica — es la fragmentación operativa: para saber qué está pasando hay que ir a Linear, GitHub y Slack por separado; los cambios simples requieren interrumpir a alguien; no hay un lugar centralizado para aprobar cambios o disparar reviews.

Beautonomous lives in OpenClaw UI — the team opens the core-internal-team-workflow project and works from there. Slack receives proactive notifications and pipeline approvals, which can be answered directly from a Slack message without opening OpenClaw. The terminal with Claude Code is a third path for direct technical operations on repositories. Beautonomous vive en OpenClaw UI — el equipo abre el proyecto core-internal-team-workflow y trabaja desde ahí. Slack recibe las notificaciones proactivas y las aprobaciones del pipeline de deploy, que pueden responderse directamente desde un mensaje de Slack sin abrir OpenClaw. La terminal con Claude Code es una tercera vía para operaciones técnicas directas sobre repositorios.

7.2 — ArchitectureArquitectura

┌──────────────────────────────┐   ┌──────────────────────────────┐
│  OPENCLAW UI                 │   │  SLACK                       │
│  Primary interface           │   │  Second native channel       │
│                              │   │                              │
│  Full conversation + context │   │  Direct conversation         │
│  All tools + auth by role    │   │  Proactive notifications     │
│  History + audit log         │   │  Pipeline approvals          │
└──────────────┬───────────────┘   └───────────────┬──────────────┘
               │                                   │
               └──────────────┬────────────────────┘
                              │
                   Terminal / Claude Code
                   (direct technical operations)
                              │
┌─────────────────────────────▼───────────────────────────────────┐
│  OPENCLAW — Agent Engine                                        │
│  ReAct Loop · Governance Guard · Audit Log                      │
│  Auth: identifies role automatically by logged-in user          │
│  Connectors: GitHub · Linear · Code · Slack                     │
└─────────────────────────────┬───────────────────────────────────┘
                              │ invokes via API / GitHub Actions
┌─────────────────────────────▼───────────────────────────────────┐
│  QUALITY BASE STRUCTURE — Per stack repository                  │
│  ├── CLAUDE.md          repo instructions + conventions         │
│  ├── .claude/memory/    persistent context                      │
│  └── quality-gate.yml   GitHub Action: lint + tests + review    │
└─────────────────────────────────────────────────────────────────┘
                
Slack ChannelCanal Slack PurposeUso
#engineeringTechnical decisions, unreviewed PRs, architectureDecisiones técnicas, PRs sin revisar, arquitectura
#deploysQuality gate results, workflow status, CI/CD failuresResultados del quality gate, estado de workflows, fallas en CI/CD
#generalTeam communicationComunicación del equipo
#teamDaily sprint summary (9:00 AM)Resumen diario de sprint (9:00 AM)

7.3 — Four Capabilities (v1)Cuatro Capacidades (v1)

1. View status from SlackVer status desde Slack

Any member asks in Slack and gets a synthesized response from GitHub, Linear and Slack. Daily auto-summary in #team at 9:00 AM: pending PRs, failing CI, tasks in progress per person, active blockers.Cualquier miembro pregunta en Slack y obtiene una respuesta sintetizada desde GitHub, Linear y Slack. Resumen diario automático en #team a las 9:00 AM: PRs pendientes, CI fallando, tareas en progreso por persona, bloqueos activos.

2. Create & manage tasks from SlackCrear y gestionar tareas desde Slack

Create tasks, assign them, change status and add comments in Linear — from Slack, without opening Linear.Crear tareas, asignarlas, cambiar estado y agregar comentarios en Linear — desde Slack, sin abrir Linear.

3. Approve PRsAprobar PRs

When a PR passes the quality gate, Beautonomous notifies Mateo with the summary, diff and review result. Mateo responds from OpenClaw UI or Slack DM. If the PR targets production, the same flow reaches Pablo after Mateo approves. No need to open GitHub.Cuando un PR pasa el quality gate, Beautonomous notifica a Mateo con el resumen, diff y resultado de la revisión. Mateo responde desde OpenClaw UI o Slack DM. Si el PR va a producción, el mismo flujo llega a Pablo después de que Mateo aprueba. No hay que abrir GitHub.

4. Activate quality agentActivar quality agent

The quality gate runs automatically on every PR. Can also be triggered manually from OpenClaw UI, terminal or Slack. Includes inter-repo contract validation: if a PR breaks an interface another project consumes, the gate fails with the specific reason.El quality gate corre automáticamente en cada PR. También puede activarse manualmente desde OpenClaw UI, terminal o Slack. Incluye validación de contratos entre repos: si un PR rompe una interfaz que otro proyecto consume, el gate falla con la razón específica.

7.4 — GovernanceGobernanza

The most critical component. Without it, an agent with access to all repositories and pipelines is an operational risk. El componente más crítico. Sin ella, un agente con acceso a todos los repositorios y pipelines es un riesgo operativo.

Three RolesTres Roles

RoleRol WhoQuién Can doPuede hacer CannotNo puede
El Capitán Pablo Estrada Full read · Linear tasks · UI changes (generates PR) · final prod approvalLectura total · tareas Linear · cambios UI (genera PR) · aprobación final a prod Workflows · backend/infraWorkflows · backend/infra
El Mago Mateo Quintero Everything · approve PRs · any workflow · manage roles · technical sign-offTodo · aprobar PRs · cualquier workflow · gestionar roles · firma técnica
El Artesano Andres · Sergio Full read · propose changes via PR · staging workflows · own tasksLectura total · proponer cambios via PR · staging workflows · tareas propias Approve PRs · infra · prodAprobar PRs · infra · prod

Risk TaxonomyTaxonomía de Riesgo

LevelNivel FlowFlujo ExamplesEjemplos
ReadLectura No confirmationSin confirmación View PRs, query tasks, read logsVer PRs, consultar tareas, leer logs
ReversibleReversible Requester confirmsConfirmación del solicitante Create task, comment PR, UI text, staging workflowCrear tarea, comentar PR, texto UI, workflow staging
Needs approvalRequiere aprobación Requester + El Mago via SlackSolicitante + El Mago vía Slack Backend logic, infra config, staging env varsLógica backend, config infra, variables staging
IrreversibleIrreversible Requester + El Mago + permanent recordSolicitante + El Mago + registro permanente Prod env vars, billing changes, delete dataVariables prod, billing, eliminar datos

Authorization FlowFlujo de Autorización

1. Agent identifies required tools and risk level of the set.El agente identifica herramientas necesarias y nivel de riesgo del conjunto.

2. Verifies user role has permission. If not: explains and offers to escalate to El Mago.Verifica que el rol del usuario tiene permiso. Si no: explica y ofrece escalar a El Mago.

3. Shows exactly what it will do — diff for code, preview for Slack.Muestra exactamente qué va a hacer — diff para código, preview para Slack.

4. User confirms. For “needs approval”: El Mago receives Slack notification.El usuario confirma. Para “requiere aprobación”: El Mago recibe notificación en Slack.

5. Executes and records in Audit Log: timestamp, user, role, tool, params, result.Ejecuta y registra en el Audit Log: timestamp, usuario, rol, herramienta, parámetros, resultado.

PrinciplesPrincipios

1. Least privilege — each role accesses only what it needs.Mínimo privilegio — cada rol accede solo a lo que necesita.

2. Explicit confirmation — no state-modifying action runs without confirmation.Confirmación explícita — ninguna acción que modifique estado se ejecuta sin confirmación.

3. Full traceability — every action (who, when, what args, what result) is in the Audit Log.Trazabilidad completa — toda acción (quién, cuándo, qué argumentos, qué resultado) queda en el Audit Log.

4. Read/write separation — querying never requires permission; modifying always does.Separación lectura/escritura — consultar nunca requiere permiso; modificar siempre lo requiere.

5. Declared reversibility — each action declares if it has rollback or not.Reversibilidad declarada — cada acción declara si tiene rollback o no.

7.5 — Quality GateQuality Gate

Runs automatically on every PR to develop or main, and manually from Slack. 5 sequential steps — if any fails, the PR doesn’t advance. Se activa automáticamente en cada PR hacia develop o main, y manualmente desde Slack. 5 pasos secuenciales — si cualquiera falla, el PR no avanza.

StepPaso ToolHerramienta DetectsDetecta
0. Base structureEstructura base Shell script Required files present (CLAUDE.md, .claude/*, quality-gate.yml)Archivos requeridos presentes (CLAUDE.md, .claude/*, quality-gate.yml)
1. Lint + typestipos ESLint + tsc / ruff Syntax errors, wrong typesErrores de sintaxis, tipos incorrectos
2. Tests Jest / pytest Broken tests, coverage below minimumTests rotos, cobertura bajo el mínimo
3. Architecture review Claude Code API Clean Architecture violations, broken inter-repo contractsViolaciones de Clean Architecture, contratos rotos entre repos
4. Convention check Claude Code API Naming, folder structure, repo patternsNaming, estructura de carpetas, patrones del repo

Steps 3–4 receive full repo context: CLAUDE.md + MEMORY.md + .claude/specs/* + PR diff. Skills (e.g. clean-ddd-hexagonal, solid) are available as additional context. Pasos 3–4 reciben contexto completo del repo: CLAUDE.md + MEMORY.md + .claude/specs/* + diff del PR. Skills (ej: clean-ddd-hexagonal, solid) disponibles como contexto adicional.

7.6 — Approval Pipeline & EnvironmentsPipeline de Aprobación y Ambientes

PR opened
     │
     ▼
Quality Gate (auto — Claude Code)
     ├── FAILS → #deploys + DM to Artesano → back to Artesano. End.
     │
     └── PASSES → DM to Mateo in Slack
                      │
                      ├── REJECTS → comment on PR + DM to Artesano → End.
                      │
                      └── APPROVES
                             ├── target staging → auto merge
                             └── target prod → DM to Pablo in Slack
                                                       ├── REJECTS → End.
                                                       └── APPROVES → merge → deploy prod
                
EnvironmentAmbiente Branch ApprovalsAprobaciones
dev feature/* Quality gate
staging develop Quality gate + Mateo
prod main Quality gate + Mateo + Pablo

7.7 — Quality Base Structure Per RepoEstructura Base de Calidad por Repo

All 11 active repositories have a minimum structure. This is what turns Beautonomous from a generic agent into one that knows each project specifically. Bootstrap from core-internal-team-workflow/templates/. Los 11 repositorios activos del stack tienen una estructura mínima. Es lo que convierte a Beautonomous de un agente genérico a uno que conoce cada proyecto específicamente. Bootstrap desde core-internal-team-workflow/templates/.

repo/
├── CLAUDE.md                        # Agent contract with this repo
├── .claudeignore                    # Files Claude should not read
└── .claude/
    ├── settings.json                # Team permissions + hooks (committed)
    ├── memory/
    │   └── MEMORY.md                # Persistent context per repo
    ├── specs/
    │   ├── architecture.md          # Architecture decisions + boundaries
    │   ├── contracts.md             # Inter-repo contracts (detail)
    │   └── testing.md               # What to test and how
    └── skills/ (symlinks)           # Relevant skills for this repo
        ├── clean-ddd-hexagonal
        ├── solid
        └── clean-architecture
                

Inter-Repo ContractsContratos entre Repos

A contract is any interface between two projects that, if changed in one, breaks the other. They live in each repo’s CLAUDE.md under ## Contratos con otros repos. The quality gate reads them as context when reviewing a PR. Un contrato es cualquier interfaz entre dos proyectos que, si cambia en uno, rompe el otro. Viven en el CLAUDE.md de cada repo bajo ## Contratos con otros repos. El quality gate los lee como contexto al revisar un PR.

## Contratos con otros repos

### Expone (otros repos dependen de esto)
- ICreditsGate.canProceed({ userId, toolCategory }) → { allowed, reason }
  Consumidor: core-intelligence-conversation-api
  Rompe si: cambia la firma, cambia el significado de `allowed`

### Consume (este repo depende de esto)
- POST /internal/gate (core-platform-billing)
  Rompe si: cambia el path, cambia el body schema
                

Skills Per RepoSkills por Repo

RepositoryRepositorio Skills
conversation-apiclean-ddd-hexagonal · solid · clean-architecture · rag-retrieval · rag-architect · llm-app-patterns · prompt-engineering-patterns · hybrid-search · heuristic-evaluation · heuristics-and-checklists · evolutionary-metric-ranking
semantic-baserag-retrieval · rag-architect · hybrid-search · llm-app-patterns · solid · heuristics-and-checklists
data-synchronizatorsolid · heuristics-and-checklists
desktop-clientsolid · heuristic-evaluation · clean-architecture · heuristics-and-checklists
infrastructuresolid · heuristics-and-checklists
marketplace-providerclean-ddd-hexagonal · solid · clean-architecture · heuristics-and-checklists
billingclean-ddd-hexagonal · solid · clean-architecture · heuristics-and-checklists
enrichmentsolid · clean-ddd-hexagonal · clean-architecture · llm-app-patterns · prompt-engineering-patterns · heuristics-and-checklists
feedbacksolid · clean-ddd-hexagonal · clean-architecture · evolutionary-metric-ranking · llm-app-patterns · heuristics-and-checklists
stack-evaluationsolid · clean-ddd-hexagonal · llm-app-patterns · prompt-engineering-patterns · heuristics-and-checklists · evolutionary-metric-ranking
team-workflowsolid · heuristics-and-checklists · prompt-engineering-patterns

7.8 — ProactivityProactividad

Beautonomous does not wait to be asked to notify critical situations. Beautonomous no espera a que le pregunten para notificar situaciones críticas.

TriggerDisparador Automatic ActionAcción Automática
GitHub Action fails (any repo)GitHub Action falla (cualquier repo) Message in #deploys: workflow, repo, branch, log linkMensaje en #deploys: workflow, repo, rama, link al log
GitHub Action fails on main / prodGitHub Action falla en main / prod #deploys + DM to El Mago#deploys + DM a El Mago
PR unreviewed > 4 hoursPR sin revisar > 4 horas Ping in #engineering with link and authorPing en #engineering con enlace y autor
Linear task blocked > 2 daysTarea Linear bloqueada > 2 días Alert to El Mago with block contextAlerta a El Mago con contexto del bloqueo
9:00 AM daily9:00 AM diario #team summary: pending PRs, failing CI, tasks per person, blockersResumen en #team: PRs pendientes, CI fallando, tareas por persona, bloqueos

7.9 — System Prompt (OpenClaw)System Prompt (OpenClaw)

# Beautonomous — Agente Operativo Interno de Shopilot

Eres el agente operativo del equipo. Tu función: dar visibilidad completa
del proyecto y ejecutar acciones en GitHub, Linear, Slack y el código.
Operas desde OpenClaw UI, Slack y terminal. El rol del usuario ya viene
determinado por OpenClaw — nunca lo asumas ni lo pidas explícitamente.

## Usuario actual
{USER_NAME} | {USER_EMAIL} | Rol: {USER_ROLE}

## Roles
El Capitán (pablo@shopilot.ai):
  - Lectura total · tareas Linear · cambios UI (genera PR)
  - Aprobación final de negocio para prod
  - NO puede: workflows, backend/infra

El Mago (mateo@shopilot.ai):
  - Acceso completo a todos los sistemas
  - Aprobar PRs, cualquier workflow, gestionar roles
  - Firma técnica en el pipeline de aprobación

El Artesano (andres@shopilot.ai, sergio@shopilot.ai):
  - Lectura total · proponer cambios via PR
  - Staging workflows · tareas propias
  - Enviar mensajes a Slack (con confirmación)

## Gobernanza — NUNCA omitas estas reglas
1. Antes de cualquier escritura: muestra exactamente qué vas a hacer.
2. Para código: muestra el diff completo antes de crear el PR.
3. Para Slack: muestra la vista previa antes de publicar.
4. Si el rol no tiene permiso: explica y ofrece escalar a El Mago.
5. Acciones de alto riesgo requieren confirmación de El Mago, siempre.
6. Confirma el resultado: qué cambió, dónde, cuándo.

## Repositorios del stack
core-intelligence-conversation-api     (Coach — Node.js 18 TS, Lambda)
core-knowledge-semantic-base           (KB — Go + Vertex AI + BigQuery)
core-knowledge-data-synchronizator     (Data Sync — Airflow Python + GCS)
core-product-desktop-client            (App — Electron + React)
core-platform-infrastructure           (Infra — CDK TS + Terraform GCP)
core-action-marketplace-provider       (Marketplace — TypeScript)
core-platform-billing                  (Billing — TypeScript)
core-knowledge-enrichment              (Enrichment — TypeScript)
core-quality-feedback                  (Feedback — TypeScript)
core-quality-stack-evaluation          (Eval — TypeScript)
core-internal-team-workflow            (this project — config)

## Canales Slack autorizados
#engineering · #deploys · #general · #team
                

7.10 — Connectors (OpenClaw Native — OAuth Only)Conectores (OpenClaw Nativos — Solo OAuth)

ConnectorConector ReadLectura WriteEscritura #
GitHub repos, PRs, issues, workflows, logsrepos, PRs, issues, workflows, logs issues, comments, propose PR, trigger workflowsissues, comentarios, PR propuesto, disparar workflows 10
Linear tasks, sprints, team metricstareas, sprints, métricas de equipo create/assign/comment tasks, change state/prioritycrear/asignar/comentar tareas, cambiar estado/prioridad 9
Code read file, search codeleer archivo, buscar en código low-risk changes via PR, propose logic changes via PRcambios de bajo riesgo via PR, proponer cambios via PR 7
Slack channels, threads, searchcanales, hilos, búsqueda messages (with confirmation), approval notificationsmensajes (con confirmación), notificaciones de aprobación 5

7.11 — What It Does NOT DoQué NO Hace

×Not the product interface — Beautonomous is the team’s agent, not the seller’s. No relation to the Coach or projects #12–#11 at runtime.No es la interfaz del producto — Beautonomous es el agente del equipo, no del vendedor. Sin relación con el Coach ni los proyectos #12–#11 en runtime.
×Does not approve its own PRs — PRs it generates can only be approved by El Mago. Never self-merge.No aprueba sus propios PRs — Los PRs que genera solo los puede aprobar El Mago. Nunca self-merge.
×Does not manage production credentials — AWS/GCP secrets, external API tokens, prod env vars — out of scope. El Mago handles directly.No gestiona credenciales de producción — Secrets de AWS/GCP, tokens de APIs externas, variables de prod — fuera del scope. El Mago maneja directamente.
×Does not make technical decisions — detects violations in the quality gate but doesn’t decide if an architecture change is correct. Escalates to El Mago with context.No toma decisiones técnicas — detecta violaciones en el quality gate pero no decide si un cambio de arquitectura es correcto. Escala a El Mago con contexto.
×Does not auto-sync memory — general MEMORY.md is not generated from individual ones. Requires El Mago to update for cross-repo decisions.No sincroniza memory automáticamente — MEMORY.md general no se genera desde los individuales. Requiere que El Mago actualice para decisiones cross-repo.

7.12 — Code vs ConfigurationCódigo vs Configuración

ComponentComponente TypeTipo WhereDónde
System prompt + roles + reposConfigConfiguraciónOpenClaw panel
Slack bot integrationConfigConfiguraciónOpenClaw + Slack OAuth
CLAUDE.md per repopor repoText — team writesTexto — escribe el equipoRoot of each repoRaíz de cada repo
MEMORY.md per repopor repoText — agent + El MagoTexto — agente + El Mago.claude/memory/
quality-gate.yml + API scriptYAML + code — write onceYAML + código — se escribe una vez.github/workflows/
Branch protection rulesConfigConfiguraciónGitHub Settings

The only code to write is the script invoking Claude Code via API inside quality-gate.yml. Written once, replicated from the template in core-internal-team-workflow/templates/. El único código que hay que escribir es el script que invoca Claude Code vía API dentro de quality-gate.yml. Se escribe una vez y se replica desde el template en core-internal-team-workflow/templates/.

7.13 — Current StateEstado Actual

CapabilityCapacidad StatusEstado
View status from SlackVer status desde SlackPending — OpenClaw + connectorsPendiente — OpenClaw + conectores
Create tasks from SlackCrear tareas desde SlackPending — Linear OAuthPendiente — Linear OAuth
Approve PRs from SlackAprobar PRs desde SlackPending — quality gate + branch protectionPendiente — quality gate + branch protection
Activate quality agentActivar quality agentPending — quality-gate.yml in 11 reposPendiente — quality-gate.yml en 11 repos
Proactivity (alerts + daily summary)Proactividad (alertas + resumen diario)Pending — OpenClaw configuredPendiente — OpenClaw configurado
Quality base per repo (full bootstrap)Estructura base por repo (bootstrap completo)🔨 Partial — only in conversation-apiParcial — solo en conversation-api

Everything starts from zero except the quality base structure in one repo. Todo parte de cero excepto la estructura base de calidad en un repo.

8. How We Will Build Shopilot — The 18 Projects Cómo Vamos a Construir Shopilot — Los 18 Proyectos

7 architecture layers, 11 repositories, 18 active projects. Each project will own a specific piece of the stack — from the desktop app the seller will install to the internal agent that will help the team build it. 36 tools across 3 services (data-synchronizator, marketplace-provider, enrichment) will give the Coach the ability to read, analyze, and act on the seller’s behalf. The intelligence layer will live in core-intelligence-conversation-api — a single Lambda hosting 7 projects (#2, #3, #4, #5, #6, #7, #8): the reasoning loop, the tools it can invoke, its personality, the context it assembles, proactive suggestions, guardrails, and observability. Knowledge will flow from three separate repos: core-knowledge-semantic-base (#9) for editorial KB, core-knowledge-data-synchronizator (#10) for the seller’s real-time data, and core-knowledge-enrichment (#11) for external market intelligence and content analysis. Actions on the marketplace will route through core-action-marketplace-provider (#12) — the only service that touches MeLi and Amazon APIs. Quality will be measured by core-quality-feedback (#15) for business impact and core-quality-stack-evaluation (#16) for CI/CD quality gates. The platform layer will handle billing via core-platform-billing (#13) and infrastructure via core-platform-infrastructure (#14). The seller will interact through core-product-desktop-client (#1) — an Electron app with the Coach in the sidebar. And the team itself will operate with core-internal-team-workflow (#17) — Beautonomous, the internal agent that helps build and govern the entire stack. 7 capas de arquitectura, 11 repositorios, 18 proyectos activos. Cada proyecto será dueño de una pieza específica del stack — desde la app de escritorio que instalará el vendedor hasta el agente interno que ayudará al equipo a construirlo. 36 tools en 3 servicios (data-synchronizator, marketplace-provider, enrichment) le darán al Coach la capacidad de leer, analizar y actuar en nombre del vendedor. La capa de inteligencia vivirá en core-intelligence-conversation-api — un solo Lambda alojando 7 proyectos (#2, #3, #4, #5, #6, #7, #8): el loop de razonamiento, las tools que puede invocar, su personalidad, el contexto que ensambla, sugerencias proactivas, guardrails y observabilidad. El conocimiento fluirá desde tres repos separados: core-knowledge-semantic-base (#9) para la KB editorial, core-knowledge-data-synchronizator (#10) para los datos del vendedor en tiempo real, y core-knowledge-enrichment (#11) para inteligencia de mercado externa y análisis de contenido. Las acciones en el marketplace pasarán por core-action-marketplace-provider (#12) — el único servicio que toca las APIs de MeLi y Amazon. La calidad se medirá con core-quality-feedback (#15) para impacto de negocio y core-quality-stack-evaluation (#16) para quality gates en CI/CD. La capa de plataforma manejará billing via core-platform-billing (#13) e infraestructura via core-platform-infrastructure (#14). El vendedor interactuará a través de core-product-desktop-client (#1) — una app Electron con el Coach en el sidebar. Y el equipo operará con core-internal-team-workflow (#17) — Beautonomous, el agente interno que ayuda a construir y gobernar todo el stack.

Status breakdown: Desglose de estado: EXISTS: 2 ADAPT+NEW: 2 NEW: 13
# ProjectProyecto Layer OwnerResponsable Status
1 Native Shell 1-Product1-Producto Sergio REWRITE
2 ReAct Orchestrator 2-Intelligence2-Inteligencia Mateo ADAPT+NEW
3 Tool Registry & Policy Engine 2-Intelligence2-Inteligencia Mateo NEW
4 Personality Engine 2-Intelligence2-Inteligencia Mateo NEW
5 Context Aggregator 2-Intelligence2-Inteligencia Mateo NEW
6 Proactive Suggestions Engine 2-Intelligence2-Inteligencia Mateo NEW
7 Guardrails 2-Intelligence2-Inteligencia Mateo NEW
8 Observability & Traceability 2-Intelligence2-Inteligencia Mateo EXISTS ADAPT
9 Cerebro / Knowledge Base 3-Knowledge3-Conocimiento Mateo EXISTS
10 Data Sync 3-Knowledge3-Conocimiento Andrés ADAPT+NEW
11 Enrichment Layer 3-Knowledge3-Conocimiento Mateo NEW
12 Marketplace Provider 4-Action4-Acción Andrés REWRITE
13 Billing & Credit Economy 5-Platform5-Plataforma Sergio REWRITE
14 DevOps (IaC) 5-Platform5-Plataforma Andrés EXISTS NEW
15 Feedback Loop 6-Quality6-Calidad Sergio REWRITE
16 Eval Suite 6-Quality6-Calidad Pablo NEW
17 Beautonomous 7-Internal7-Interno Pablo NEW
18 Design System 1-Product1-Producto Pablo · Sergio · ExternalPablo · Sergio · Externo NEW

8.1 Introduction to the Projects Introduccion a los Proyectos

The complete stack on one page. No technical references — only what each piece does and why it exists. El stack completo en una página. Sin referencias técnicas — solo qué hace cada pieza y por qué existe.

The Complete IdeaLa Idea Completa

Shopilot is a desktop browser where the seller navigates their marketplace with an AI Coach in the sidebar. The Coach knows their business, can analyze their situation, execute actions, and learn over time. Shopilot es un navegador de escritorio donde el vendedor navega en su marketplace con un Coach de IA en el sidebar. El Coach conoce su negocio, puede analizar su situación, ejecutar acciones y aprender con el tiempo.

The 19 projects are the pieces that make this possible (19 active). Los 19 proyectos son las piezas que hacen posible eso (19 activos).

The 8 Project FamiliesLas 8 Familias de Proyectos

1. What the user installs1. Lo que el usuario instala

core-product-desktop-client — the Electron browser. Without this, there is no product.core-product-desktop-client — el navegador Electron. Sin esto no hay producto.

2. The Coach itself2. El Coach en sí

core-intelligence-conversation-api — where all intelligence lives. Groups 7 internal projects: the reasoning loop (#2), the tools it can use (#3), its personality and tone (#4), the context it assembles before responding (#5), the proactive suggestions it detects (#6), the layer that protects it from malicious inputs (#7), and the record of everything it does (#8).core-intelligence-conversation-api — donde vive toda la inteligencia. Agrupa 7 proyectos internos: el loop que razona y decide (#2), las herramientas que puede usar (#3), su personalidad y tono (#4), el contexto que ensambla antes de responder (#5), las sugerencias proactivas que detecta (#6), la capa que lo protege de inputs maliciosos (#7), y el registro de todo lo que hace (#8).

3. What the Coach knows3. Lo que el Coach sabe

KB (core-knowledge-semantic-base) — editorial knowledge: guides, policies, best practices.  |  Data Sync (core-knowledge-data-synchronizator) — the seller's own real-time data.  |  Enrichment (core-knowledge-enrichment) — external market data and content analysis. KB (core-knowledge-semantic-base) — conocimiento editorial: guías, políticas, mejores prácticas.  |  Data Sync (core-knowledge-data-synchronizator) — datos propios del vendedor en tiempo real.  |  Enrichment (core-knowledge-enrichment) — datos del mercado externo y análisis de contenido.

4. What the Coach can do4. Lo que el Coach puede hacer

Marketplace Provider (core-action-marketplace-provider) — executes real changes: publishes, responds, creates campaigns. Includes seller OAuth token management (Auth Vault, #10, absorbed — no longer a standalone project).Marketplace Provider (core-action-marketplace-provider) — ejecuta cambios reales: publica, responde, crea campañas. Incluye la gestión de tokens OAuth del vendedor (Auth Vault, #10, absorbido — ya no es proyecto independiente).

5. How it gets paid5. Cómo se paga

core-platform-billing — measures consumption and charges the seller. Merges Token Economy (#13) and Billing (#14). #14 no longer a standalone project — everything lives in #13.core-platform-billing — mide el consumo y cobra al vendedor. Fusiona Token Economy (#13) y Billing (#14). #14 ya no es proyecto independiente — todo vive en #13.

6. Where it runs6. Dónde corre

DevOps (core-platform-infrastructure) — all cloud infrastructure in one place.DevOps (core-platform-infrastructure) — toda la infraestructura cloud en un solo lugar.

7. What the Coach learns and how we measure quality7. Lo que el Coach aprende y cómo medimos calidad

Feedback Loop (core-quality-feedback) — measures whether the Coach's actions truly helped the seller. Records the impact of each change on sales, visits, and conversion. The seller sees: “your title change generated +54% visits in 7 days”.  |  Eval Framework (core-quality-stack-evaluation) — measures whether the Coach responds well before reaching production. Blocks changes that degrade quality.Feedback Loop (core-quality-feedback) — mide si las acciones del Coach realmente ayudaron al vendedor. Registra el impacto de cada cambio en ventas, visitas y conversión. El vendedor ve: “tu cambio de título generó +54% visitas en 7 días”.  |  Eval Framework (core-quality-stack-evaluation) — mide si el Coach responde bien antes de llegar a producción. Bloquea cambios que empeoran la calidad.

8. How we work8. Cómo trabajamos

Beautonomous (core-internal-team-workflow) — the team's internal Coach. We build with the same tools we make.Beautonomous (core-internal-team-workflow) — el Coach interno del equipo. Construimos con las mismas herramientas que hacemos.

The Flow in Plain LanguageEl Flujo en Palabras Simples

  1. 1.The seller opens Shopilot and navigates to MeLi or Amazon.El vendedor abre Shopilot y navega en MeLi o Amazon.
  2. 2.They ask the Coach something in the sidebar.Le pregunta algo al Coach en el sidebar.
  3. 3.The Coach understands the question, consults its knowledge (KB), the seller's data (Data Sync), and the market (Enrichment).El Coach entiende la pregunta, consulta su conocimiento (KB), los datos del vendedor (Data Sync) y del mercado (Enrichment).
  4. 4.If it needs to act, it uses the Marketplace Provider to execute the operation.Si necesita actuar, usa el Marketplace Provider para ejecutar la operación.
  5. 5.It responds with context, precision, and proactive suggestions if it detects opportunities.Responde con contexto, precisión y sugerencias proactivas si detecta oportunidades.
  6. 6.Everything is logged (Observability). If there was an action, the Feedback Loop measures its impact 7 days later and shows it to the seller.Todo queda registrado (Observability). Si hubo una acción, el Feedback Loop mide su impacto 7 días después y se lo muestra al vendedor.
  7. 7.The seller pays for what they consume (Billing + Token Economy).El vendedor paga por lo que consume (Billing + Token Economy).

Minimum Viable Core — What Must Exist for the First Useful ProductNúcleo Mínimo Viable — Qué Debe Existir para el Primer Producto Útil

For the Coach to respond wellPara que el Coach responda bien

#2 Orchestrator — without it there is no Coachsin él no hay Coach

#9 KB — without knowledge the Coach guessessin conocimiento el Coach adivina

#10 Data Sync — without seller data it speaks in the abstractsin datos del vendedor habla en abstracto

#8 Observability — without tracking, billing is impossiblesin tracking no hay billing posible

#13 Billing — without billing there is no business modelsin billing no hay modelo de negocio

With just these 5, the Coach is already a product.Con solo estos 5, el Coach ya es un producto.

For the Coach to actPara que el Coach actúe

Add to the 5 above:Agregar a los 5 anteriores:

#3 Tool Registry — without it the Coach can't execute any toolsin él el Coach no puede ejecutar ninguna tool

#12 Marketplace Provider — without it WRITE tools have nowhere to gosin él las WRITE tools no tienen a dónde ir

With these 7, the Coach can make real changes in the marketplace.Con estos 7, el Coach puede hacer cambios reales en el marketplace.

For the complete productPara el producto completo

#1 Shell — the seller uses it while browsingel vendedor lo usa mientras navega

#11 Enrichment — the Coach sees the marketel Coach ve el mercado

#15 Feedback Loop — the Coach learns from its own actionsel Coach aprende de sus acciones

#4 #5 #6 #7quality, security, full experiencecalidad, seguridad, experiencia completa

#16 Eval + #14 DevOps are cross-cutting — they accompany all phases.#16 Eval + #14 DevOps son transversales — acompañan todas las fases.

Current Stack StatusEstado Actual del Stack

OperationalOperacional #8 Observability  ·  #9 KB  ·  #10 Data Sync (Brand Health + seller data)
🔨 BuildingEn construcción #2 Orchestrator (one-shot works, ReAct loop in progress)  ·  #4 Personality  ·  #5 Context  ·  #13 Billing (credits)  ·  #14 DevOps (CDK partial)
📋 PendingPendiente #3 Tool Registry  ·  #12 Marketplace Provider  ·  #11 Enrichment  ·  #15 Feedback  ·  #16 Eval  ·  #1 Shell  ·  #6 Proactive  ·  #7 Guardrails  ·  #17 Beautonomous

The Coach already answers questions with real context (KB + Brand Health). What remains: it can act (tools, Marketplace Provider) and the user can see it (Shell).El Coach ya responde preguntas con contexto real (KB + Brand Health). Lo que falta es que pueda actuar (tools, Marketplace Provider) y que el usuario lo vea (Shell).

8.2 Stack Map Mapa del Stack

Cross-cutting view of the full stack. What each project does, who depends on whom, and how the Coach is the thread connecting everything. Vista transversal del stack completo. Qué hace cada proyecto, quién depende de quién, y cómo el Coach es el hilo conductor de todo.

A marketplace seller navigates their store from Shopilot and has an AI Coach that understands their business, can act on their behalf, and learns over time.Un vendedor de marketplace navega en su tienda desde Shopilot y tiene un Coach de IA que entiende su negocio, puede actuar en su nombre y aprende con el tiempo.

The 19 Active Projects by LayerLos 19 Proyectos Activos por Capa

Product Layer — what the user seesCapa de producto — lo que el usuario ve

#1Native Shell (core-product-desktop-client) — the desktop browser. The seller navigates MeLi/Amazon while the Coach is in the sidebar.el navegador de escritorio. El vendedor navega en MeLi/Amazon mientras el Coach está en el sidebar.

Intelligence Layer — the CoachCapa de inteligencia — el Coach

#2Orchestrator (core-intelligence-conversation-api) — the Coach's brain. Receives a question, reasons, decides which tools to use, responds.el cerebro del Coach. Recibe una pregunta, razona, decide qué tools usar, responde.
#3Tool Registry — catalog of 36 tools the Coach can use. Executes them and returns results.catálogo de 36 herramientas que el Coach puede usar. Ejecuta tools y devuelve resultados.
#4Personality Engine — defines how the Coach speaks: who it is, what it knows about the seller, what it cannot say.define cómo habla el Coach: quién es, qué sabe del vendedor, qué no puede decir.
#5Context Aggregator — assembles everything the Coach needs to know before responding: KB, brand health, history.ensambla todo lo que el Coach necesita saber antes de responder: KB, salud de marca, historial.
#6Proactive Suggestions — detects opportunities while the Coach works and proposes them to the seller.detecta oportunidades mientras el Coach trabaja y las propone al vendedor.
#7Guardrails — protects the Coach from malicious inputs and prevents it from leaking sensitive information.protege al Coach de inputs maliciosos y evita que filtre información sensible.
#8Observability — records what the Coach did in each conversation and how much it cost.registra qué hizo el Coach en cada conversación y cuánto costó.

Learning & Quality Layer — what improves the stack over timeCapa de aprendizaje y calidad — lo que mejora el stack con el tiempo

#15Feedback Loop (core-quality-feedback) — measures the business impact of WRITE actions: visits, sales, and conversion before and after each change. Detects which strategies work.mide el impacto de negocio de las acciones WRITE: visitas, ventas y conversión antes y después de cada cambio. Detecta qué estrategias funcionan.
#16Eval Framework (core-quality-stack-evaluation) — measures whether the Coach responds well. Validates contracts between projects. Blocks changes that degrade quality in CI/CD.mide si el Coach responde bien. Valida contratos entre proyectos. Bloquea cambios que empeoran la calidad en CI/CD.

Data Layer — what feeds the CoachCapa de datos — lo que alimenta al Coach

#9Cerebro KB — the Coach's knowledge memory: guides, policies, marketplace best practices.la memoria de conocimiento del Coach: guías, políticas, mejores prácticas de marketplace.
#10Data Sync — the seller's own real-time data: products, orders, metrics, account health.los datos propios del vendedor en tiempo real: productos, órdenes, métricas, salud de cuenta.
#11Enrichment — external market data and content analysis: competitors, prices, images, videos.datos del mercado externo y análisis de contenido: competidores, precios, imágenes, videos.

Action Layer — what the Coach can doCapa de acción — lo que el Coach puede hacer

#12Marketplace Provider — executes marketplace operations: publish products, answer questions, create campaigns. (#10 Auth Vault absorbed — no longer a standalone project.)ejecuta operaciones en marketplace: publicar productos, responder preguntas, crear campañas. (#10 Auth Vault absorbido — ya no es proyecto independiente.)

Platform Layer — what sustains everythingCapa de plataforma — lo que sostiene todo

#13Token Economy + Billing — counts credits per conversation and charges the seller. Free/Pro plans, Credit Packs, Stripe. (#14 absorbed.)cuenta créditos por conversación y cobra al vendedor. Planes Free/Pro, Credit Packs, Stripe. (#14 absorbido.)
#14DevOps — provisions all cloud infrastructure. Defines dev/staging/prod environments.provisiona toda la infraestructura cloud. Define los entornos dev/staging/prod.

Internal Layer — how we workCapa interna — cómo trabajamos

#17Beautonomous (core-internal-team-workflow) — the team's internal Coach. Helps build and operate the stack using the same projects it manages.el Coach interno del equipo. Ayuda a construir y operar el stack usando los mismos proyectos que gestiona.

Connection DiagramDiagrama de Conexiones

╔══════════════════════════════════════════════════════════════════╗
║  USUARIO                                                         ║
║  ┌──────────────────────────────────────────────────────────┐   ║
║  │  core-product-desktop-client (#1)                      │   ║
║  │  Navegador Electron + Sidebar Coach                      │   ║
║  └──────────────────────────┬───────────────────────────────┘   ║
╚═════════════════════════════│════════════════════════════════════╝
                              │ pregunta / respuesta
╔═════════════════════════════▼════════════════════════════════════╗
║  INTELIGENCIA  (core-intelligence-conversation-api)              ║
║                                                                  ║
║  ┌─────────────────────────────────────────────────────────┐    ║
║  │  Guardrails (#7) → valida input antes de procesar      │    ║
║  └──────────────────────────┬────────────────────────────  ┘    ║
║                             │                                    ║
║  ┌──────────────────────────▼─────────────────────────────┐    ║
║  │  Orchestrator (#2)                                      │    ║
║  │  razona → decide tools → actua → observa → responde     │    ║
║  └──┬──────────┬───────────┬──────────────────────────┬────┘    ║
║     │          │           │                          │         ║
║  Context    Personality  Tool Registry (#3)       Proactive     ║
║  Aggregator  Engine       ┌─────────────────┐    Suggestions    ║
║  (#5)      (#4)         │ READ tools  ─────┤──▶ Data Sync     ║
║     │                     │ WRITE tools ─────┤──▶ Marketplace   ║
║     │                     │ ANALYSIS tools──▶┤──▶ Enrichment    ║
║     │                     └─────────────────┘    (#6)          ║
║     │                                                           ║
║  ┌──▼──────────────────────────────────────────────────────┐    ║
║  │  Context sources                                        │    ║
║  │  KB (#9) + Brand Health (#10) + UserProfile              │    ║
║  └─────────────────────────────────────────────────────────┘    ║
║                                                                  ║
║  Guardrails (#7) → valida output antes de responder             ║
║  Observability (#8) → registra todo                              ║
║  WRITE tools → emiten FeedbackEntry a core-quality-feedback      ║
╚══════════════════════════════════════════════════════════════════╝
                              │
           ┌──────────────────┤──────────────────┐
           ▼                  ▼                  ▼
  core-knowledge-       core-action-        core-knowledge-
  data-synchronizator   marketplace-        enrichment (#11)
  (#10)                  provider (#12)       Mercado externo
  Datos del vendedor    WRITE en            + analisis visual
  en tiempo real        marketplace

  core-quality-feedback (#15)
  Mide impacto de WRITE actions (7 dias despues)
  ← lee metricas de core-knowledge-data-synchronizator (#10)

╔══════════════════════════════════════════════════════════════════╗
║  PLATAFORMA                                                      ║
║  core-platform-billing (#13) — Token Economy + Billing fusionados ║
║  DevOps (#14) ───────────── provisiona toda la infra cloud         ║
╚══════════════════════════════════════════════════════════════════╝
╔══════════════════════════════════════════════════════════════════╗
║  EQUIPO Y CALIDAD                                                ║
║  core-internal-team-workflow (#17) — el Coach interno del equipo   ║
║  core-quality-stack-evaluation (#16) — evalua calidad en CI/CD   ║
╚══════════════════════════════════════════════════════════════════╝

Responsibility MatrixMatriz de Responsabilidades

What each piece of the stack is responsible for — one sentence per responsibility, mapped to the projects that own it.De qué es responsable cada pieza del stack — una frase por responsabilidad, mapeada a los proyectos que la poseen.

ResponsibilityResponsabilidadProjectProyecto
The seller can talk to the CoachEl vendedor puede hablar con el Coach#1 Shell + #2 Orchestrator
The Coach understands the seller's businessEl Coach entiende el negocio del vendedor#9 KB + #10 Data Sync + #5 Context Aggregator
The Coach can act on the marketplaceEl Coach puede actuar en el marketplace#12 Marketplace Provider
The Coach has market data and analysisEl Coach tiene datos de mercado y análisis#11 Enrichment
The Coach speaks with the right voiceEl Coach habla con la voz correcta#4 Personality Engine
The Coach detects opportunitiesEl Coach detecta oportunidades#6 Proactive Suggestions
The Coach measures action impactEl Coach mide el impacto de sus acciones#15 Feedback Loop
The Coach can't be manipulatedEl Coach no puede ser manipulado#7 Guardrails
We know if the Coach responds wellSabemos si el Coach responde bien#16 Eval Framework
We know what it does and how much it costsSabemos qué hace y cuánto cuesta#8 Observability
The seller pays for the serviceEl vendedor paga por el servicio#13 Billing
Everything runs in productionTodo corre en producción#14 DevOps
The team works with AIEl equipo trabaja con IA#17 Beautonomous
The product has a unified visual identityEl producto tiene una identidad visual unificada#18 Design System

Critical Dependencies — What Blocks WhatDependencias Críticas — Qué Bloquea Qué

#10  Data Sync ──────────────────────────────▶ #3 Tool Registry (READ tools)
#12  Marketplace Provider ───────────────────▶ #3 Tool Registry (WRITE tools)
#11 Enrichment ─────────────────────────────▶ #3 Tool Registry (ANALYSIS tools)
#9  KB ─────────────────────────────────────▶ #5 Context Aggregator
#2  Orchestrator ───────────────────────────▶ todos los anteriores
#13  Token Economy ──────────────────────────▶ #2 Orchestrator (presupuesto de tokens)
#1 Shell ──────────────────────────────────▶ #2 Orchestrator (canal de entrada)
#12  Marketplace Provider ───────────────────▶ #15 Feedback Loop (senal de WRITE ejecutado)
#10  Data Sync ──────────────────────────────▶ #15 Feedback Loop (metricas after a 7 dias)
#15  Feedback Loop ──────────────────────────▶ #9 KB (Fase 3: FeedbackLearner actualiza KB)
#8  Observability ──────────────────────────▶ #16 Eval Framework (casos de eval desde trazas)

Without Data Sync, Enrichment, and Marketplace Provider, the Orchestrator has no data and cannot act. They are the three projects that unlock the Coach's real utility.Sin Data Sync, Enrichment y Marketplace Provider, el Orchestrator no tiene datos ni puede actuar. Son los tres proyectos que desbloquean la utilidad real del Coach.

Contracts Between ProjectsContratos Entre Proyectos

The exact points where one project talks to another. Every arrow in the diagram has a contract behind it.Los puntos exactos donde un proyecto habla con otro. Cada flecha del diagrama tiene un contrato detrás.

FromDesdeToHaciaContractContrato
#1 Shell#2 OrchestratorHTTP REST → WebSocket (Phase 4). User message / Coach response.HTTP REST → WebSocket (Fase 4). Mensaje usuario / respuesta Coach.
#2 Orchestrator#13 BillingICreditsGate / POST /internal/gate — verifies credits before each tool call.ICreditsGate — verifica créditos antes de cada tool call.
#3 Tool Registry#10 Data SyncHTTP — READ tools call the Data Sync API.HTTP — tools READ llaman a la API de Data Sync.
#3 Tool Registry#12 Marketplace ProviderHTTP — WRITE tools call the Provider API.HTTP — tools WRITE llaman a la API del Provider.
#3 Tool Registry#11 EnrichmentHTTP — ANALYSIS tools call the Enrichment API.HTTP — tools ANALYSIS llaman a la API de Enrichment.
#2 Orchestrator#15 Feedback LoopFeedbackEntry emitted by HookLifecycle.after_tool after each successful WRITE.FeedbackEntry emitida por HookLifecycle.after_tool después de cada WRITE exitoso.
#15 Feedback Loop#10 Data SyncHTTP — queries product metrics 7 days after WRITE (before/after comparison).HTTP — consulta métricas del producto 7 días después del WRITE.
#8 Observability#16 Eval FrameworkExport of real traces to build automatic evaluation cases.Exportación de trazas reales para construir casos de evaluación automáticos.

External ContractsContratos Externos

Points where our stack connects to third-party systems outside our control.Puntos donde nuestro stack se conecta con sistemas de terceros fuera de nuestro control.

FromDesdeExternal SystemSistema ExternoContractContrato
#13 BillingStripeCheckout Sessions + webhooks (payment_succeeded, subscription events)Checkout Sessions + webhooks (payment_succeeded, eventos de suscripción)
#12 Marketplace ProviderMercadoLibre APIREST — product ops, questions, campaignsREST — operaciones de producto, preguntas, campañas
#12 Marketplace ProviderAmazon SP-APISP-API — same operations, different adapterSP-API — mismas operaciones, adaptador diferente
#10 Data SyncMeLi / AmazonPolling or push of seller dataPolling o push de datos del vendedor
#2 OrchestratorAnthropic / VertexLLM calls with cache_control for prompt cachingLlamadas LLM con cache_control para caché de prompts

Completeness CheckVerificación de Completitud

Every area of the product is covered by at least one project. No orphan responsibilities.Cada área del producto está cubierta por al menos un proyecto. Sin responsabilidades huérfanas.

AreaÁreaCovered byCubierto porStatusEstado
User interfaceInterfaz de usuario#1 Native Shell
AI reasoningRazonamiento IA#2 Orchestrator
Tool executionEjecución de herramientas#3 Tool Registry
Editorial knowledgeConocimiento editorial#9 KB
Seller dataDatos del vendedor#10 Data Sync
Market intelligenceInteligencia de mercado#11 Enrichment
Marketplace actionsAcciones en marketplace#12 Marketplace Provider
Personality & tonePersonalidad y tono#4 Personality Engine
Context assemblyEnsamblaje de contexto#5 Context Aggregator
Proactive detectionDetección proactiva#6 Proactive Suggestions
Security & guardrailsSeguridad y guardrails#7 Guardrails
Observability & loggingObservabilidad y logging#8 Observability
Impact measurementMedición de impacto#15 Feedback Loop
Quality evaluationEvaluación de calidad#16 Eval Framework
Billing & monetizationFacturación y monetización#13 Billing
InfrastructureInfraestructura#14 DevOps
Internal team operationsOperaciones internas del equipo#17 Beautonomous
Brand identity & componentsIdentidad de marca y componentes#18 Design System
🖥

Layer 1 — PRODUCTCapa 1 — PRODUCTO

What the seller installs and seesLo que el vendedor instala y ve

+
#1

Native Shell

Interface & UX — Sergio

REWRITE

The biggest new build (~35% of total effort) and the user's primary interface. An Electron desktop app that functions as a specialized eCommerce browser. Left side: WebContentsView renders marketplace websites (MeLi, Amazon, Shopify) with full functionality — sellers navigate naturally. Right side: React sidebar provides the AI copilot interface — chat with real-time WebSocket streaming, proactive suggestion cards, tool progress indicators, and confirmation dialogs for risky actions. Beyond the chat, the sidebar includes dedicated views: ProfileView (user settings, preferences, workspace config), BillingView (plan status, credits, upgrade/pack purchase via Stripe Checkout), EnrollmentView (marketplace OAuth2 connection flows), and an OnboardingWizard for first-time users that guides profile setup, first marketplace connection, and initial Coach interaction. Marketplace Detector identifies URL patterns to auto-detect which marketplace is active. Tab system supports multiple independent sessions with separate cookies. IPC Bridge handles communication between Electron main process and React renderer. The Coach backend streams events via WebSocket — text chunks, tool progress, confirmations — following a formalized bidirectional protocol. Each phase produces a deployable build that validates real backend flows — no separate testing interface needed. Dev Tools panel (toggle in dev mode) exposes traces, tokens, and latencies for internal QA. Mac-only for MVP with .dmg distribution via electron-builder. La construccion nueva mas grande (~35% del esfuerzo total) y la interfaz principal del usuario. Una app de escritorio Electron que funciona como un navegador especializado de eCommerce. Lado izquierdo: WebContentsView renderiza sitios web de marketplaces (MeLi, Amazon, Shopify) con funcionalidad completa — los vendedores navegan naturalmente. Lado derecho: sidebar React provee la interfaz del copiloto IA — chat con streaming WebSocket en tiempo real, cards de sugerencias proactivas, indicadores de progreso de herramientas, y dialogos de confirmacion para acciones riesgosas. Mas alla del chat, el sidebar incluye vistas dedicadas: ProfileView (configuracion de usuario, preferencias, workspace), BillingView (estado de plan, creditos, upgrade/compra de packs via Stripe Checkout), EnrollmentView (flujos de conexion OAuth2 con marketplaces), y un OnboardingWizard para usuarios nuevos que guia setup de perfil, primera conexion a marketplace, e interaccion inicial con el Coach. Marketplace Detector identifica patrones de URL para auto-detectar que marketplace esta activo. Sistema de tabs soporta multiples sesiones independientes con cookies separadas. IPC Bridge maneja comunicacion entre proceso principal de Electron y renderer React. El backend del Coach transmite eventos via WebSocket — text chunks, progreso de tools, confirmaciones — siguiendo un protocolo bidireccional formalizado. Cada fase produce un build desplegable que valida flujos reales del backend — no se necesita interfaz de testing separada. Panel Dev Tools (toggle en modo dev) expone traces, tokens y latencias para QA interno. Solo Mac para MVP con distribucion .dmg via electron-builder.

Beautonomous governance: confirmation dialogs in the sidebar render Core's ConfirmationFlow state machine — the Shell is the UI enforcement layer that presents PENDING actions and captures the seller's CONFIRMED/REJECTED decision. No WRITE executes without this dialog being resolved.Governance de Beautonomous: los diálogos de confirmación en el sidebar renderizan la máquina de estados del ConfirmationFlow de Core — la Shell es la capa de aplicación de UI que presenta acciones PENDING y captura la decisión CONFIRMED/REJECTED del vendedor. Ningún WRITE se ejecuta sin que este diálogo sea resuelto.

Design System governance: core-product-design-system (#18) is the single source of truth for all visual components. The Figma follows Atomic Design (atoms, molecules, organisms, templates, pages). Claude consumes the Figma via Figma MCP when implementing UI components — no React components are created outside of what is defined in the Figma.Governance del Design System: core-product-design-system (#18) es la fuente única de verdad para todos los componentes visuales. El Figma sigue Atomic Design (átomos, moléculas, organismos, plantillas, páginas). Claude consume el Figma via Figma MCP al implementar componentes de UI — no se crean componentes React fuera de lo definido en el Figma.

Electron Main Process
Window management, WebContentsViewGestion de ventanas, WebContentsView
Marketplace Detector
MeLi + Amazon + Shopify URL patterns (remote config)Patrones URL MeLi + Amazon + Shopify (config remota)
Tab System
Multiple sessions, independent cookiesMultiples sesiones, cookies independientes
React Sidebar
Chat, suggestions, progress, confirmationsChat, sugerencias, progreso, confirmaciones
IPC Bridge
Main to renderer communicationComunicacion main a renderer
Auto-Updater
electron-updater + GitHub Releaseselectron-updater + GitHub Releases
App Packaging
Mac .dmg (electron-builder)Mac .dmg (electron-builder)
Dev Tools
Traces, tokens, latencies (dev mode)Trazas, tokens, latencias (modo dev)
ProfileView
User settings, preferences, workspaceConfig de usuario, preferencias, workspace
BillingView
Plan status, credits, Stripe CheckoutEstado de plan, creditos, Stripe Checkout
EnrollmentView
OAuth2 marketplace connection flowsFlujos de conexion OAuth2 a marketplaces
OnboardingWizard
First-use guided setup (5 steps)Setup guiado para primer uso (5 pasos)

Recommended Tech StackStack Tecnologico Recomendado

Electron 28+ React 18+ TypeScript 5+ Tailwind CSS 3+ electron-builder electron-updater WebSocket client Zustand react-router react-hook-form Figma MCP (Design System #18) Sentry
Data Models, WebSocket Protocol, API Signatures & Acceptance Criteria Modelos de Datos, Protocolo WebSocket, APIs & Criterios de Aceptación
Data Models — Electron Main ProcessModelos de Datos — Proceso Principal Electron
// Electron Main Process Components:
// 1. WindowManager: 1200x800 default, split 70% WebContentsView / 30% Sidebar
// 2. MarketplaceDetector URL patterns:
//    mercadolibre.com.*/seller/*      -> MeLi Seller Center
//    mercadolibre.com.*/p/*           -> MeLi Product Page
//    sellercentral.amazon.com/*       -> Amazon Seller Central
//    admin.shopify.com/*              -> Shopify Admin
//    *                                -> Other (no injection)
// 3. SessionManager: separate cookie partitions per marketplace
// 4. TabManager: Cmd+1=MeLi, Cmd+2=Amazon, Cmd+3=Shopify
// 5. CredentialsManager: JWT in Electron safeStorage
// 6. OAuthPopupManager: dedicated BrowserWindow for OAuth2 redirects
// 7. IPC channels: marketplace:detected, auth:get-token, nav:go, notification:show, oauth:start

// Preload Script (secure bridge):
contextBridge.exposeInMainWorld('shopilotBridge', {
  getMarketplaceContext: () => ipcRenderer.invoke('ctx'),
  getAuthToken: () => ipcRenderer.invoke('auth:get-token'),
  navigateTo: (url: string) => ipcRenderer.send('nav:go', url),
  onNotification: (cb: Function) => ipcRenderer.on('notification', cb),
  startOAuth: (marketplace: string) => ipcRenderer.invoke('oauth:start', marketplace),
})
// NEVER expose ipcRenderer directly
WebSocket Streaming Protocol (Shell ↔ Coach)Protocolo Streaming WebSocket (Shell ↔ Coach)
// === COACH → SHELL (server-to-client events) ===
interface TextChunkEvent     { type: 'text_chunk';             text: string }
interface ToolStartEvent     { type: 'tool_start';             toolId: string; toolName: string; category: 'READ'|'WRITE'|'ANALYSIS'|'SYSTEM' }
interface ToolProgressEvent  { type: 'tool_progress';          toolId: string; message: string }  // "Consultando metricas..."
interface ToolCompleteEvent  { type: 'tool_complete';          toolId: string; resultSummary: string }
interface ConfirmationReqEv  { type: 'confirmation_required';  confirmationId: string; action: string; before: object; after: object }
interface SuggestionEvent    { type: 'suggestion';             id: string; suggestionType: string; message: string; priority: number }
interface ErrorEvent         { type: 'error';                  code: string; message: string }
interface RoundEndEvent      { type: 'round_end';              roundNumber: number; creditsUsed: number }

// === SHELL → COACH (client-to-server events) ===
interface MessageEvent       { type: 'message';               text: string; marketplaceContext: MarketplaceContext }
interface ConfirmationResEv  { type: 'confirmation_result';   confirmationId: string; approved: boolean }
interface CancelEvent        { type: 'cancel' }
interface ContextUpdateEvent { type: 'context_update';        marketplace: string; page: string; productId?: string }

// Connection: wss://api.shopilot.ai/ws?token={JWT}
// Reconnect: exponential backoff (1s, 2s, 4s, 8s, max 30s)
// Heartbeat: ping/pong every 30s, timeout 90s
// Session restore: on reconnect, server replays last incomplete round
React Sidebar — Hooks & ViewsReact Sidebar — Hooks & Vistas
// === React Hooks (7 total, was 4) ===
// useShopilot()     — WebSocket lifecycle, send/receive, streaming text assembly, reconnect
// useMarketplace()  — IPC marketplace:detected events, current context
// useSuggestions()   — Receives suggestion events via WebSocket (was: REST polling)
// useCredentials()   — Get/refresh JWT via IPC
// useProfile()       — GET/PUT /users/:userId/profile, form state (NEW)
// useBilling()       — GET /billing/status, plan/credits/period (NEW)
// useEnrollment()    — GET /auth/marketplaces/:userId, connect/disconnect flows (NEW)

// === Sidebar Views (react-router, in-sidebar navigation) ===
// /chat              — Default: ChatPanel + SuggestionCards + ToolProgress + ConfirmationDialog
// /profile           — ProfileView: name, categories, goals, language, connected marketplaces (read-only)
// /billing           — BillingView: plan badge, credits bar, billing period, upgrade/pack/portal buttons
// /enrollment        — EnrollmentView: marketplace cards (MeLi/Amazon/Shopify), connect/disconnect, status
// /onboarding        — OnboardingWizard: 5-step guided setup (first run only)

// === Sidebar Navigation ===
// Header: Logo + marketplace indicator + credits badge + nav icons (chat | profile | billing | enrollment)
// Bottom nav or icon bar for view switching — chat is always the primary/default view

// === Keyboard Shortcuts ===
// Cmd+L -> Focus chat input
// Cmd+B -> Toggle sidebar
// Cmd+K -> Command palette
// Cmd+1..9 -> Switch marketplace tabs
// Escape -> Cancel current operation / dismiss confirmation
// Cmd+Enter -> Send message (Enter = newline)
// Cmd+Shift+D -> Toggle Dev Tools panel (dev mode only)

// Auto-Updater: electron-updater + GitHub Releases, check on startup + every 6h
Backend APIs Consumed (all from other projects)APIs Backend Consumidas (todas de otros proyectos)
// From #2 Orchestrator:
// WebSocket wss://api.shopilot.ai/ws  — streaming protocol defined above

// From #13 Billing:
// GET  /billing/status             — BillingStatus (plan, credits, period)
// POST /billing/checkout           — Stripe Checkout redirect (upgrade to Pro)
// POST /billing/packs/checkout     — Stripe Checkout redirect (buy Credit Pack)
// POST /billing/portal             — Stripe Customer Portal redirect

// From #12 Marketplace Provider:
// POST   /auth/connect/:marketplace           — Start OAuth2, returns redirect URL
// GET    /auth/callback/:marketplace          — OAuth2 callback (handled in popup)
// GET    /auth/marketplaces/:userId           — List connected marketplaces + status
// DELETE /auth/disconnect/:userId/:marketplace — Revoke tokens + cleanup

// From UserProfile API (DynamoDB):
// GET  /users/:userId/profile      — UserProfile (name, categories, goals, prefs)
// PUT  /users/:userId/profile      — Update UserProfile
Acceptance CriteriaCriterios de Aceptación
  • [Shell] App opens, loads MeLi, sidebar works in <5s
  • [Shell] MarketplaceDetector identifies: seller center, product detail, search results
  • [Shell] App does not exceed 500MB RAM with 3 tabs open (target 400MB)
  • [Shell] .dmg installs on Mac without unsigned app errors
  • [Chat] WebSocket streaming: text_chunk events render <50ms after receipt (no visible lag)
  • [Chat] tool_start/tool_progress/tool_complete events show real-time progress indicator
  • [Chat] Confirmations show clear before/after with working Accept/Reject buttons
  • [Chat] WebSocket reconnects automatically with exponential backoff, session restores last incomplete round
  • [Profile] ProfileView loads user data, edits save to backend, changes reflect in Coach context within next message
  • [Billing] BillingView shows plan, credits remaining (bar), billing period. Upgrade button opens Stripe Checkout. Pack button opens pack Checkout
  • [Enrollment] "Connect MeLi" opens OAuth2 popup, completes flow, marketplace appears as connected. Same for Amazon and Shopify
  • [Enrollment] "Disconnect" revokes tokens and updates status immediately
  • [Onboarding] First-time user sees OnboardingWizard: profile setup → marketplace connect → first chat → tour. Completion persisted
  • [Onboarding] Returning user skips onboarding, goes directly to chat
  • [Shortcuts] Cmd+L focus chat, Cmd+B toggle sidebar, Cmd+K command palette
  • [Update] Auto-updater downloads and installs updates without crash
  • [Shell] App abre, carga MeLi, sidebar funciona en <5s
  • [Shell] MarketplaceDetector identifica: seller center, detalle de producto, resultados busqueda
  • [Shell] App no excede 500MB RAM con 3 tabs abiertos (meta 400MB)
  • [Shell] .dmg se instala en Mac sin errores de app no firmada
  • [Chat] Streaming WebSocket: eventos text_chunk renderizan <50ms despues de recepcion (sin lag visible)
  • [Chat] Eventos tool_start/tool_progress/tool_complete muestran indicador de progreso en tiempo real
  • [Chat] Confirmaciones muestran before/after claro con botones Aceptar/Rechazar funcionales
  • [Chat] WebSocket reconecta automaticamente con backoff exponencial, sesion restaura ultimo round incompleto
  • [Perfil] ProfileView carga datos del usuario, ediciones guardan al backend, cambios se reflejan en contexto del Coach en el siguiente mensaje
  • [Billing] BillingView muestra plan, creditos restantes (barra), periodo de facturacion. Boton upgrade abre Stripe Checkout. Boton pack abre Checkout de packs
  • [Enrollment] "Conectar MeLi" abre popup OAuth2, completa flujo, marketplace aparece como conectado. Igual para Amazon y Shopify
  • [Enrollment] "Desconectar" revoca tokens y actualiza estado inmediatamente
  • [Onboarding] Usuario nuevo ve OnboardingWizard: setup perfil → conectar marketplace → primer chat → tour. Completacion persistida
  • [Onboarding] Usuario que regresa salta onboarding, va directo al chat
  • [Shortcuts] Cmd+L foco chat, Cmd+B toggle sidebar, Cmd+K paleta de comandos
  • [Update] Auto-updater descarga e instala updates sin crash

Window: 1200x800 · Split: 70/30 · RAM: <500MB (target 400MB) · WS heartbeat: 30s · WS reconnect: exp backoff max 30s · Auto-update: startup + 6h · Mac only MVP

How It WorksComo Funciona

  +---------------------------------------------------------------+
  |                    ELECTRON MAIN PROCESS                       |
  |  +------------------+  +------------------+  +--------------+ |
  |  | WindowManager    |  | MarketplaceDetect|  | SessionMgr   | |
  |  | 1200x800 default |  | URL Patterns:    |  | Per-mktplace | |
  |  | 70/30 split      |  | meli.com/seller  |  | cookie       | |
  |  | WebContentsView  |  | sellercentral.*  |  | partitions   | |
  |  | minimize to tray |  | admin.shopify.*  |  | Persist      | |
  |  +------------------+  +--------+---------+  +--------------+ |
  |  +------------------+           |             +--------------+ |
  |  | TabManager       |           v             | OAuthPopup   | |
  |  | Cmd+1..9 switch  |  IPC: marketplace:det   | Dedicated    | |
  |  | Independent      |  IPC: oauth:start       | BrowserWindow| |
  |  | cookies per tab  |  IPC: auth:get-token     | for OAuth2   | |
  |  +------------------+  +------------------+  | redirects    | |
  |  +------------------+  | CredentialsMgr   |  +--------------+ |
  |  | IPC Handlers     |  | JWT safeStorage  |  +--------------+ |
  |  | 7 channels       |  | Auto-refresh     |  | AutoUpdater  | |
  |  +------------------+  +------------------+  +--------------+ |
  +---------------------------------------------------------------+
                |
                v (contextBridge - secure preload)
  +---------------------------------------------------------------+
  |                    RENDERER (React Sidebar)                    |
  |                                                                |
  |  +-----------------------------------------------------------+|
  |  | Header: Logo + Mktplace indicator + Credits + Nav icons   ||
  |  | [Chat] [Profile] [Billing] [Enrollment]                   ||
  |  +-----------------------------------------------------------+|
  |  |                                                            ||
  |  | /chat (default)          /profile         /billing         ||
  |  | +---------------------+  +--------------+ +--------------+ ||
  |  | | SuggestionCards     |  | Name, email  | | Plan: Pro    | ||
  |  | | ChatPanel           |  | Categories   | | Credits: 342 | ||
  |  | |  - Streaming text   |  | Goals        | | [==== ] 68%  | ||
  |  | |  - ToolProgress     |  | Language     | | Period: ends | ||
  |  | |  - Confirmations    |  | Connected    | | [Upgrade]    | ||
  |  | |  - Input + palette  |  | marketplaces | | [Buy Pack]   | ||
  |  | +---------------------+  +--------------+ +--------------+ ||
  |  |                                                            ||
  |  | /enrollment              /onboarding (first run)           ||
  |  | +---------------------+  +---------------------------------+|
  |  | | [MeLi]  Connected   |  | Step 1: Profile setup          ||
  |  | | [Amazon] Connect    |  | Step 2: Connect marketplace    ||
  |  | | [Shopify] Connect   |  | Step 3: First chat with Coach  ||
  |  | | [Disconnect]        |  | Step 4: Quick sidebar tour     ||
  |  | +---------------------+  +---------------------------------+|
  |  |                                                            ||
  |  | StatusBar: WS dot + view name + shortcut hint             ||
  |  +-----------------------------------------------------------+|
  |                                                                |
  |  Hooks: useShopilot | useMarketplace | useSuggestions          |
  |         useCredentials | useProfile | useBilling | useEnrollment|
  +---------------------------------------------------------------+

  === WEBSOCKET STREAMING (Shell <-> Coach) ===

  Shell                          Coach (#2 Orchestrator)
    |                                |
    |-- message {text, context} -->  |
    |                                |-- ReAct loop starts
    |  <-- text_chunk {text} -----  |   (LLM streaming)
    |  <-- text_chunk {text} -----  |
    |  <-- tool_start {name} -----  |   (before_tool hook)
    |  <-- tool_progress {msg} ---  |   ("Consultando metricas...")
    |  <-- tool_complete {res} ---  |   (after_tool hook)
    |  <-- confirmation_required -  |   (WRITE action)
    |-- confirmation_result ------>  |   (user approves/rejects)
    |  <-- text_chunk {text} -----  |   (Coach explains result)
    |  <-- round_end {credits} ---  |   (round complete)
    |                                |

  Reconnect: exponential backoff (1s, 2s, 4s... max 30s)
  Heartbeat: ping/pong 30s, timeout 90s
  Session restore: server replays last incomplete round on reconnect
            

The Native Shell is an Electron desktop app (~35% of total project effort) that combines a WebContentsView for marketplace browsing (70% of window) with a React sidebar for the AI copilot (30%). The Main Process runs 7 managers: WindowManager, MarketplaceDetector, SessionManager, TabManager, CredentialsManager, OAuthPopupManager (dedicated BrowserWindow for OAuth2 redirect flows — never uses the main WebContentsView), and AutoUpdater. The Renderer is a full React app with 7 custom hooks: useShopilot (WebSocket streaming protocol), useMarketplace (IPC context), useSuggestions (WebSocket events), useCredentials (JWT via IPC), useProfile (user settings CRUD), useBilling (plan/credits status from #13), and useEnrollment (marketplace connect/disconnect via #12). The sidebar uses react-router for in-sidebar view navigation: /chat (default), /profile, /billing, /enrollment, and /onboarding (first-run wizard). Communication with the Coach backend uses a bidirectional WebSocket protocol with 8 server-to-client event types and 4 client-to-server event types, replacing the previous REST polling approach. The OnboardingWizard guides first-time users through profile setup, marketplace connection, and first Coach interaction before showing the main chat view. Mac-only for MVP (.dmg with code signing).El Native Shell es una app de escritorio Electron (~35% del esfuerzo total del proyecto) que combina un WebContentsView para navegacion del marketplace (70% de la ventana) con un sidebar React para el copilot de IA (30%). El Proceso Principal ejecuta 7 managers: WindowManager, MarketplaceDetector, SessionManager, TabManager, CredentialsManager, OAuthPopupManager (BrowserWindow dedicada para flujos redirect OAuth2 — nunca usa el WebContentsView principal), y AutoUpdater. El Renderer es una app React completa con 7 hooks custom: useShopilot (protocolo streaming WebSocket), useMarketplace (contexto IPC), useSuggestions (eventos WebSocket), useCredentials (JWT via IPC), useProfile (CRUD de configuracion de usuario), useBilling (estado plan/creditos de #13), y useEnrollment (conectar/desconectar marketplaces via #12). El sidebar usa react-router para navegacion interna entre vistas: /chat (default), /profile, /billing, /enrollment, y /onboarding (wizard de primer uso). La comunicacion con el backend del Coach usa un protocolo WebSocket bidireccional con 8 tipos de eventos server-to-client y 4 client-to-server, reemplazando el enfoque anterior de REST polling. El OnboardingWizard guia a usuarios nuevos a traves de setup de perfil, conexion de marketplace, e interaccion inicial con el Coach antes de mostrar la vista principal de chat. Solo Mac para MVP (.dmg con code signing).

Implementation PlanPlan de Implementacion

Phase 1: Electron Skeleton + WebContentsView (Week 1-3)Fase 1: Esqueleto Electron + WebContentsView (Semana 1-3)

Set up Electron 28+ project with TypeScript, React 18, TailwindCSS, Zustand, and react-router. Implement WindowManager with 1200x800 default, 70/30 split between WebContentsView and React sidebar. Build the WebContentsView that loads MeLi Seller Center. Implement the secure preload script with contextBridge (never expose ipcRenderer directly). Set up electron-builder for .dmg packaging. Set up sidebar router scaffolding (/chat, /profile, /billing, /enrollment, /onboarding as placeholder routes). Validation: deployable .dmg that loads MeLi with sidebar placeholder and route navigation working — confirms Electron shell + WebContentsView + build pipeline end-to-end.Configurar proyecto Electron 28+ con TypeScript, React 18, TailwindCSS, Zustand, y react-router. Implementar WindowManager con 1200x800 default, split 70/30 entre WebContentsView y sidebar React. Construir el WebContentsView que carga MeLi Seller Center. Implementar el preload script seguro con contextBridge (nunca exponer ipcRenderer directamente). Configurar electron-builder para empaquetado .dmg. Configurar scaffolding del router del sidebar (/chat, /profile, /billing, /enrollment, /onboarding como rutas placeholder). Validacion: .dmg desplegable que carga MeLi con sidebar placeholder y navegacion de rutas funcionando — confirma Electron shell + WebContentsView + pipeline de build end-to-end.

Phase 2: MarketplaceDetector + Tabs + Enrollment (Week 3-5)Fase 2: MarketplaceDetector + Tabs + Enrollment (Semana 3-5)

Implement MarketplaceDetector with URL pattern matching for MeLi, Amazon, and Shopify. Build the IPC bridge: marketplace:detected events sent to sidebar. Implement TabManager with independent cookie partitions per tab via SessionManager, tab bar UI, and Cmd+1-9 quick switching. Build OAuthPopupManager: dedicated BrowserWindow that opens for OAuth2 redirects (POST /auth/connect/:marketplace), intercepts callback URL, closes popup on success. Build EnrollmentView: marketplace cards with Connect/Disconnect buttons, status indicators, useEnrollment() hook consuming #12 APIs. Validation: navigate across marketplaces, connect MeLi via OAuth2 popup, see it as "Connected" in EnrollmentView, tabs maintain independent sessions.Implementar MarketplaceDetector con matching de patrones de URL para MeLi, Amazon y Shopify. Construir puente IPC: eventos marketplace:detected enviados al sidebar. Implementar TabManager con particiones de cookies independientes por tab via SessionManager, UI de barra de tabs, y switching Cmd+1-9. Construir OAuthPopupManager: BrowserWindow dedicada que se abre para redirects OAuth2 (POST /auth/connect/:marketplace), intercepta URL de callback, cierra popup al completar. Construir EnrollmentView: tarjetas de marketplace con botones Conectar/Desconectar, indicadores de estado, hook useEnrollment() consumiendo APIs de #12. Validacion: navegar entre marketplaces, conectar MeLi via popup OAuth2, verlo como "Conectado" en EnrollmentView, tabs mantienen sesiones independientes.

Phase 3: Chat + WebSocket Streaming + Profile + Billing (Week 5-8)Fase 3: Chat + Streaming WebSocket + Perfil + Billing (Semana 5-8)

Build the ChatPanel: message list with react-markdown rendering, implement WebSocket streaming protocol (useShopilot hook: connect, text_chunk assembly, tool_start/progress/complete indicators, confirmation_required/result flow, reconnect with exponential backoff, session restore). Build ConfirmationDialog with before/after preview and approve/reject buttons. Input area with auto-resize textarea and "/" command palette. Build ProfileView: useProfile() hook for GET/PUT UserProfile, form with name, categories, goals, language preference (react-hook-form). Build BillingView: useBilling() hook for GET /billing/status, plan badge, credits progress bar, billing period display, Upgrade/Buy Pack/Manage buttons that redirect to Stripe Checkout/Portal. Build Dev Tools panel (toggle via Cmd+Shift+D). Validation: full end-to-end flow — login, chat with streaming, confirm write actions, edit profile, view billing status, upgrade via Stripe, view traces in Dev Tools.Construir ChatPanel: lista de mensajes con renderizado react-markdown, implementar protocolo streaming WebSocket (hook useShopilot: conectar, ensamblaje de text_chunk, indicadores tool_start/progress/complete, flujo confirmation_required/result, reconexion con backoff exponencial, restauracion de sesion). Construir ConfirmationDialog con preview antes/despues y botones aprobar/rechazar. Area de input con textarea auto-resize y paleta de comandos "/". Construir ProfileView: hook useProfile() para GET/PUT UserProfile, formulario con nombre, categorias, goals, preferencia de idioma (react-hook-form). Construir BillingView: hook useBilling() para GET /billing/status, badge de plan, barra de progreso de creditos, display de periodo de facturacion, botones Upgrade/Comprar Pack/Gestionar que redirigen a Stripe Checkout/Portal. Construir panel Dev Tools (toggle via Cmd+Shift+D). Validacion: flujo end-to-end completo — login, chat con streaming, confirmar acciones de escritura, editar perfil, ver estado de billing, upgrade via Stripe, ver trazas en Dev Tools.

Phase 4: Onboarding + Polish + Signing + Auto-Update (Week 8-11)Fase 4: Onboarding + Pulido + Firma + Auto-Update (Semana 8-11)

Build OnboardingWizard: 5-step guided flow for first-time users — Step 1: profile setup (name, categories, goals), Step 2: connect first marketplace (reuses EnrollmentView OAuth2 flow), Step 3: first guided Coach interaction ("Ask Shopilot about your metrics"), Step 4: quick sidebar tour (highlight chat, suggestions, shortcuts). Detect first-run via UserProfile.onboardingCompleted flag. Implement all keyboard shortcuts. Add minimize-to-tray for proactive notifications. Set up AutoUpdater with electron-updater + GitHub Releases. Obtain Apple Developer Certificate for code signing + notarization. Performance optimization: <5s startup, <300MB RAM with 3 tabs. Add Sentry for crash reporting. Final QA and .dmg distribution.Construir OnboardingWizard: flujo guiado de 5 pasos para usuarios nuevos — Paso 1: setup de perfil (nombre, categorias, goals), Paso 2: conectar primer marketplace (reutiliza flujo OAuth2 de EnrollmentView), Paso 3: primera interaccion guiada con el Coach ("Preguntale a Shopilot sobre tus metricas"), Paso 4: tour rapido del sidebar (resaltar chat, sugerencias, shortcuts). Detectar primer uso via flag UserProfile.onboardingCompleted. Implementar todos los atajos de teclado. Agregar minimizar a tray para notificaciones proactivas. Configurar AutoUpdater con electron-updater + GitHub Releases. Obtener Apple Developer Certificate para code signing + notarizacion. Optimizacion de rendimiento: <5s startup, <300MB RAM con 3 tabs. Agregar Sentry para crash reporting. QA final y distribucion .dmg.

Risk AnalysisAnalisis de Riesgos

Marketplace Session/Cookie BreakageRotura de Sesion/Cookies del Marketplace

Impact: HighImpacto: Alto

Mitigation: Each marketplace gets an isolated session partition (Electron partition: persist:mercadolibre, persist:amazon). Sessions persist across app restarts. If a session expires, detect the login redirect and prompt the user to re-authenticate. Never inject JavaScript into marketplace pages to avoid triggering anti-bot protections.Mitigacion: Cada marketplace obtiene una particion de sesion aislada (Electron partition: persist:mercadolibre, persist:amazon). Las sesiones persisten entre reinicios de la app. Si una sesion expira, detectar el redireccionamiento de login y solicitar al usuario que re-autentique. Nunca inyectar JavaScript en paginas del marketplace para evitar disparar protecciones anti-bot.

Electron Memory BloatInflacion de Memoria de Electron

Impact: HighImpacto: Alto

Mitigation: Hard limit of 3 tabs at MVP. Lazy-load tab content (background tabs are suspended after 5min of inactivity). Monitor memory via Electron process.getProcessMemoryInfo(). Alert and force-GC if total exceeds 400MB. Profile Chromium renderer per tab to identify leaks. Target: under 300MB with 3 active tabs.Mitigacion: Limite duro de 3 tabs en MVP. Carga lazy del contenido de tabs (tabs en background se suspenden despues de 5min de inactividad). Monitorear memoria via Electron process.getProcessMemoryInfo(). Alertar y forzar GC si el total excede 400MB. Perfilar renderer de Chromium por tab para identificar leaks. Objetivo: bajo 300MB con 3 tabs activas.

macOS Code Signing + Notarization DelaysRetrasos en Code Signing + Notarizacion de macOS

Impact: MediumImpacto: Medio

Mitigation: Apply for Apple Developer Program in Week 1 (approval takes 24-48h). Test code signing and notarization in CI pipeline early (Week 3). Without proper signing, macOS Gatekeeper blocks the app entirely. Keep electron-builder config version-controlled for reproducible builds.Mitigacion: Solicitar Apple Developer Program en Semana 1 (la aprobacion toma 24-48h). Testear code signing y notarizacion en pipeline CI temprano (Semana 3). Sin firma adecuada, macOS Gatekeeper bloquea la app completamente. Mantener configuracion de electron-builder con control de versiones para builds reproducibles.

OAuth2 Popup Flow Blocked by OS/MarketplaceFlujo OAuth2 en Popup Bloqueado por OS/Marketplace

Impact: HighImpacto: Alto

Mitigation: Use a dedicated Electron BrowserWindow (not a system browser) for OAuth2 redirects — this avoids popup blockers and gives us control over the redirect interception. Intercept the callback URL via webContents.on('will-redirect') before it reaches the marketplace's callback endpoint. If MeLi/Amazon/Shopify change their OAuth2 flow or block Electron user-agents, fall back to system browser with localhost callback server (loopback redirect). Test OAuth2 flows against all 3 marketplaces in Phase 2.Mitigacion: Usar BrowserWindow dedicada de Electron (no navegador del sistema) para redirects OAuth2 — esto evita bloqueadores de popups y nos da control sobre la intercepcion del redirect. Interceptar URL de callback via webContents.on('will-redirect') antes de que llegue al endpoint de callback del marketplace. Si MeLi/Amazon/Shopify cambian su flujo OAuth2 o bloquean user-agents de Electron, caer a navegador del sistema con servidor callback localhost (redirect loopback). Testear flujos OAuth2 contra los 3 marketplaces en Fase 2.

WebSocket Connection InstabilityInestabilidad de Conexion WebSocket

Impact: HighImpacto: Alto

Mitigation: Exponential backoff reconnection (1s, 2s, 4s, 8s, max 30s). Ping/pong heartbeat every 30s with 90s timeout to detect dead connections. On reconnect, server replays the last incomplete round to prevent lost context. StatusBar shows real-time connection indicator (green/yellow/red). If WebSocket fails persistently (>3 retries), fall back to REST long-polling as degraded mode — chat still works, just without streaming progress indicators.Mitigacion: Reconexion con backoff exponencial (1s, 2s, 4s, 8s, max 30s). Heartbeat ping/pong cada 30s con timeout de 90s para detectar conexiones muertas. Al reconectar, el servidor reenvia el ultimo round incompleto para prevenir perdida de contexto. StatusBar muestra indicador de conexion en tiempo real (verde/amarillo/rojo). Si WebSocket falla persistentemente (>3 reintentos), caer a REST long-polling como modo degradado — el chat sigue funcionando, solo sin indicadores de progreso en streaming.

Key DecisionsDecisiones Clave

D1.

Electron over browser extension or web app — A browser extension cannot control tab sessions, isolate cookie partitions, or provide a persistent sidebar across navigations. A web app cannot detect which marketplace page the user is viewing. Electron gives us full control: embedded browser with session isolation, native OS integrations (tray, keychain, notifications), and auto-updates. The ~35% effort premium is justified by the vastly superior UX.Electron en vez de extension de navegador o web app — Una extension de navegador no puede controlar sesiones de tabs, aislar particiones de cookies, ni proveer un sidebar persistente entre navegaciones. Una web app no puede detectar que pagina de marketplace esta viendo el usuario. Electron nos da control total: navegador embebido con aislamiento de sesiones, integraciones nativas del OS (tray, keychain, notificaciones), y auto-updates. La prima de ~35% de esfuerzo se justifica por la UX vastamente superior.

D2.

Zustand over Redux for state management — The sidebar state is relatively simple: current marketplace context, chat messages, suggestions list, auth token. Redux's boilerplate (actions, reducers, selectors) is overkill. Zustand provides the same reactivity with 80% less code. The entire store fits in a single file. If state complexity grows in Scope Full, migration to Redux Toolkit is straightforward.Zustand en vez de Redux para manejo de estado — El estado del sidebar es relativamente simple: contexto actual del marketplace, mensajes de chat, lista de sugerencias, token de auth. El boilerplate de Redux (actions, reducers, selectors) es excesivo. Zustand provee la misma reactividad con 80% menos codigo. Todo el store cabe en un solo archivo. Si la complejidad del estado crece en Scope Full, la migracion a Redux Toolkit es directa.

D3.

Mac-only MVP, defer Windows and Linux — 70%+ of LatAm e-commerce sellers use macOS or can install a .dmg. Building for Windows (code signing with EV certificate, Windows Defender SmartScreen) and Linux (AppImage, Snap, deb) adds 2-3 weeks of CI/CD complexity. Ship Mac first, validate product-market fit, then expand platform coverage based on user demand.MVP solo Mac, diferir Windows y Linux — 70%+ de vendedores de e-commerce en LatAm usan macOS o pueden instalar un .dmg. Construir para Windows (code signing con certificado EV, Windows Defender SmartScreen) y Linux (AppImage, Snap, deb) agrega 2-3 semanas de complejidad en CI/CD. Lanzar Mac primero, validar product-market fit, luego expandir cobertura de plataformas basado en demanda de usuarios.

D4.

WebSocket over SSE for Coach communication — SSE is server-to-client only. The Shell needs to send confirmation_result (approve/reject) and context_update (marketplace changed) to the Coach — that requires bidirectional communication. WebSocket provides full-duplex on a single connection. The IEventEmitter abstraction in #2 already supports swapping WebSocketEventEmitter in Phase 4. API Gateway WebSocket (AWS) in production, ws library for local dev.WebSocket en vez de SSE para comunicacion con Coach — SSE es solo server-to-client. La Shell necesita enviar confirmation_result (aprobar/rechazar) y context_update (marketplace cambio) al Coach — eso requiere comunicacion bidireccional. WebSocket provee full-duplex en una sola conexion. La abstraccion IEventEmitter en #2 ya soporta intercambiar WebSocketEventEmitter en Fase 4. API Gateway WebSocket (AWS) en produccion, libreria ws para dev local.

D5.

All views inside the sidebar, not separate windows — Profile, Billing, and Enrollment could each be separate Electron windows, but that fragments the UX. Keeping everything in the sidebar with react-router maintains the "copilot panel" mental model — the sidebar is always visible alongside the marketplace. Navigation between views is instant (no window creation overhead). The chat remains the primary view; others are secondary settings screens.Todas las vistas dentro del sidebar, no ventanas separadas — Perfil, Billing y Enrollment podrian ser ventanas Electron separadas, pero eso fragmenta la UX. Mantener todo en el sidebar con react-router preserva el modelo mental de "panel copilot" — el sidebar esta siempre visible junto al marketplace. La navegacion entre vistas es instantanea (sin overhead de creacion de ventanas). El chat es la vista principal; las demas son pantallas de configuracion secundarias.

D6.

Onboarding as skippable wizard, not mandatory gate — The wizard guides first-time users through profile + enrollment + first chat, but can be dismissed and resumed later from ProfileView. This avoids blocking power users who already know what they're doing, while still reducing friction for newcomers. Completion state persisted in UserProfile.onboardingCompleted (DynamoDB).Onboarding como wizard saltable, no gate obligatorio — El wizard guia a usuarios nuevos por perfil + enrollment + primer chat, pero puede cerrarse y retomarse luego desde ProfileView. Esto evita bloquear a power users que ya saben lo que hacen, mientras reduce friccion para recien llegados. Estado de completacion persistido en UserProfile.onboardingCompleted (DynamoDB).

MVP Scope

[v4] MeLi + Amazon + Shopify detection, dark mode, Mac .dmg. RAM: 500MB max (target 400MB). Breadcrumb nav (not URL bar). Remote config for URL patterns. Credit balance in sidebar + upgrade/buy-pack CTAs. WebSocket streaming protocol (bidirectional). ProfileView + BillingView + EnrollmentView (OAuth2 popup). OnboardingWizard (5-step first-run). 7 React hooks, 5 sidebar views via react-router. [v4] Deteccion MeLi + Amazon + Shopify, dark mode, Mac .dmg. RAM: 500MB max (meta 400MB). Navegacion breadcrumb (no barra URL). Config remota para patrones URL. Saldo de creditos en sidebar + CTAs upgrade/comprar packs. Protocolo streaming WebSocket (bidireccional). ProfileView + BillingView + EnrollmentView (popup OAuth2). OnboardingWizard (5 pasos en primer uso). 7 hooks React, 5 vistas sidebar via react-router.

Inspired byInspirado en

New (~35% effort). Cursor Electron pattern. Nuevo (~35% esfuerzo). Patron Electron de Cursor.

Source:Fuente: New (~35% effort)Nuevo (~35% esfuerzo) | Depends on:Depende de: #2 (Orchestrator — WebSocket), #13 (Billing — BillingStatus API), #12 (Marketplace Provider — OAuth2 enrollment)
📝 Project ChangelogChangelog del Proyecto
v4 Mar 2, 2026
!Absorbs former #6 Playground as Dev Tools panel (inspector, trace viewer, tool debugger)Absorbe former #6 Playground como panel Dev Tools (inspector, trace viewer, tool debugger)
+4 new views: ProfileView, BillingView, EnrollmentView, OnboardingWizard (5-step)4 vistas nuevas: ProfileView, BillingView, EnrollmentView, OnboardingWizard (5 pasos)
+WebSocket streaming protocol: 8 server events + 4 client events, heartbeat 30s, reconnect with backoffProtocolo streaming WebSocket: 8 eventos servidor + 4 eventos cliente, heartbeat 30s, reconnect con backoff
+OAuthPopupManager: dedicated BrowserWindow for OAuth2 redirectsOAuthPopupManager: BrowserWindow dedicada para redirects OAuth2
+react-router for sidebar navigation (5 routes: /chat, /profile, /billing, /enrollment, /onboarding)react-router para navegacion en sidebar (5 rutas: /chat, /profile, /billing, /enrollment, /onboarding)
~Components 8 → 12, React hooks 4 → 7Componentes 8 → 12, React hooks 4 → 7
~Risks 3 → 5 (+ OAuth2 popup blocked, WebSocket instability)Riesgos 3 → 5 (+ popup OAuth2 bloqueado, inestabilidad WebSocket)
~Acceptance criteria 8 → 16 (Shell, Chat, Profile, Billing, Enrollment, Onboarding, Shortcuts, Update)Criterios de aceptación 8 → 16 (Shell, Chat, Profile, Billing, Enrollment, Onboarding, Shortcuts, Update)
v3 Feb 27-28, 2026
+Full deep spec card: Electron + WebContentsView, sidebar chat, 70/30 split, marketplace detectionCard deep spec completa: Electron + WebContentsView, chat en sidebar, split 70/30, deteccion de marketplace
v2.1 Feb 27, 2026
+Credit balance in sidebar header + contextual banners (upgrade / buy credits)Balance de creditos en header del sidebar + banners contextuales (upgrade / comprar creditos)
v2 Feb 27, 2026
~BrowserView → WebContentsView (Electron 28+, BrowserView deprecated)BrowserView → WebContentsView (Electron 28+, BrowserView deprecado)
~Navigation: URL bar → contextual breadcrumbNavegacion: barra URL → breadcrumb contextual
~RAM target: <300MB → <500MB (400MB target)Objetivo RAM: <300MB → <500MB (400MB objetivo)
+MarketplaceDetector URL patterns updatable via remote configPatrones URL del MarketplaceDetector actualizables via remote config
v1 Feb 26, 2026
+Initial — “Native Shell” (v1 #4), NEW status, Electron + BrowserViewInicial — “Native Shell” (v1 #4), estado NEW, Electron + BrowserView
#18

Design System

Brand & Components — UX/UI (executes Figma) · Pablo (approves) · Sergio (consumes → React Mockups)

NEW

Repository of specifications and context (no executable code) that bridges design in Figma with React implementation in core-product-desktop-client (#1). 44 components catalogued following Atomic Design (13 atoms, 6 AI-native atoms, 10 molecules, 16 organisms, 5 screens), 9 pending brand decisions (D1-D9), and a design-to-code pipeline via Figma MCP. The external UX/UI team delivers the brand book and Figma files (T0.BB–T4.BB each sprint); Pablo approves each delivery; Sergio consumes the components and creates integration Mockups in React. Claude reads the Figma via MCP (get_file, get_node, get_variable_defs, get_metadata) to generate matching React components — no components are created outside of what is defined in the Figma. Includes UX Writing guide (copies, tone, number formatting), 8 data visualization patterns for sellers, AI-native interaction patterns (streaming, thinking, tool stages, confirmation), and desktop Electron patterns (title bar, split pane, tab bar, status bar, keyboard shortcuts). Backed by competitive brand analysis of 16 reference brands (Cursor, Linear, Shopify/Polaris, Vercel/Geist, Anthropic, HubSpot/Canvas, Brex, Mercury, and 8 more). Repositorio de especificaciones y contexto (sin código ejecutable) que conecta el diseño en Figma con la implementación React en core-product-desktop-client (#1). 44 componentes catalogados siguiendo Atomic Design (13 átomos, 6 átomos AI-native, 10 moléculas, 16 organismos, 5 pantallas), 9 decisiones de marca pendientes (D1-D9), y un pipeline design-to-code via Figma MCP. El equipo externo UX/UI entrega el brand book y los archivos Figma (T0.BB–T4.BB cada sprint); Pablo aprueba cada entrega; Sergio consume los componentes y crea Mockups de integración en React. Claude lee el Figma via MCP (get_file, get_node, get_variable_defs, get_metadata) para generar componentes React — no se crean componentes fuera de lo definido en el Figma. Incluye guía de UX Writing (copies, tono, formato de números), 8 patrones de visualización de datos para vendedores, patrones de interacción AI-native (streaming, thinking, tool stages, confirmación), y patrones desktop Electron (title bar, split pane, tab bar, status bar, atajos de teclado). Respaldado por análisis competitivo de 16 marcas de referencia (Cursor, Linear, Shopify/Polaris, Vercel/Geist, Anthropic, HubSpot/Canvas, Brex, Mercury, y 8 más).

4 mandatory agent rules: (1) No inventing components outside Figma — if one is missing, stop and report. (2) No hardcoded values — all colors, sizes, radii, spacing come from Figma variables with codeSyntax. (3) Figma is source of truth — no prior knowledge, no other projects, no “what looks good”. (4) Verify before implementing — must execute get_variable_defs + get_metadata via MCP before writing any React code. 4 reglas mandatorias del agente: (1) No inventar componentes fuera de Figma — si falta uno, detenerse y reportar. (2) No hardcodear valores — todos los colores, tamaños, radios y espaciados vienen de variables Figma con codeSyntax. (3) Figma es fuente de verdad — no se usa conocimiento previo, ni otros proyectos, ni “lo que se vea bien”. (4) Verificar antes de implementar — debe ejecutar get_variable_defs + get_metadata via MCP antes de escribir código React.

Brand Identity
9 brand decisions (D1-D9), logo, palette, typography 9 decisiones de marca (D1-D9), logo, paleta, tipografía
Design Tokens
3 collections: Primitives → Semantic → Component 3 colecciones: Primitives → Semantic → Component
44 Components
13 atoms + 6 AI-native + 10 molecules + 16 organisms + 5 screens 13 átomos + 6 AI-native + 10 moléculas + 16 organismos + 5 pantallas
Figma MCP Bridge
get_file, get_node, get_variable_defs, get_metadata get_file, get_node, get_variable_defs, get_metadata
UX Writing
Copy patterns, tone, number formatting, terminology Patrones de copy, tono, formato de números, terminología
Data Viz Patterns
8 patterns: KPI cards, tables, gauges, sparklines 8 patrones: KPI cards, tablas, gauges, sparklines
AI-native Patterns
Streaming, thinking, tool stages, confirmation, errors Streaming, thinking, tool stages, confirmación, errores
Brand Intelligence
16 brand analyses: Cursor, Linear, Polaris, Vercel… 16 análisis de marca: Cursor, Linear, Polaris, Vercel…

Tech Stack Stack Tecnológico

Figma Figma MCP (mcp__figma) Pencil Dev MCP (mcp__pencil) Atomic Design Style Dictionary 4 Tailwind CSS W3C DTCG Tokens
Component Catalogue, Figma Architecture, Token Schema, Brand Decisions, Pipeline & Acceptance Criteria Catálogo de Componentes, Arquitectura Figma, Esquema de Tokens, Decisiones de Marca, Pipeline & Criterios de Aceptación
Component Catalogue — 44 Total Catálogo de Componentes — 44 Total
// === ATOMS (13) — base components, single responsibility ===
Button, Input, Badge, Icon, StatusDot, Spinner, Toggle, Divider,
AvatarInitials, CreditBadge, ProgressBar, Tooltip, KbdShortcut

// === AI-NATIVE ATOMS (6) — specific to conversational copilot ===
StreamingCursor (█), ThinkingPulse (•••), ToolBadge,
AgentStatusBar, RiskBadge, TTLCountdown

// === MOLECULES (10) — atom combinations ===
SearchBar, TabBar, Select, Toggle (labeled), Tooltip (rich),
ProgressBar (labeled), Dropdown, KbdShortcut (combo), InputField, CreditDisplay

// === ORGANISMS (16) — complex UI blocks ===
ChatInputBar, MessageBubble, ContextBar, ToolAccordion, ConfirmDialog,
ReActStream, ProactiveCard, DataTable, AuditLog, RollbackPanel,
FraudAlert, MarketplaceKPI, CreditEconomy, OnboardingStep,
EnrollmentCard, ErrorRecovery

// === SCREENS (5) — full views at 360px sidebar width ===
ChatView, Dashboard, Settings, Billing, Enrollment
9 Brand Decisions — Blocking (Pablo) 9 Decisiones de Marca — Bloqueantes (Pablo)
D1: Brand emotion      PENDING  A: "Warm Precision" (rec) / B: "Data Intelligence" / C: "Growth Engine"
D2: Primary color      PENDING  Orange #F97316 / Indigo #6366F1 / Sky #0EA5E9 / Emerald #10B981
D3: Background mode    PENDING  Dark-first (rec) / Light-first / Both
D4: Typography         PENDING  Inter + JetBrains Mono / Geist + Geist Mono / IBM Plex
D5: Logo               PENDING  Wordmark / Icon + Wordmark (rec) / Abstract mark
D6: Border radius      PENDING  Sharp (2-4px) / Standard (6-8px) / Rounded (12-20px)
D7: Shadow policy      PENDING  None / Minimal / Soft
D8: Semantic palette   PENDING  Green/Amber/Red/Blue standard (blocked by D3)
D9: UI voice           PENDING  Direct & human / Technical & precise / Empowering & confident

// Until D1-D9 are decided, cannot generate design-tokens.json nor implement components
Figma Variable Architecture — 3 Collections Arquitectura de Variables Figma — 3 Colecciones
// 00 Primitives (raw values — NEVER applied directly to designs)
color/blue/50..900, gray/50..900, red/50..900, green/50..900
spacing/0, 1(4px), 2(8px), 3(12px), 4(16px), 6(24px), 8(32px)
radius/none(0), sm(4), md(8), lg(12), xl(16), full(9999)
font-size/xs(12), sm(14), base(16), lg(18), xl(20), 2xl(24)

// 01 Semantic (with Light/Dark modes — what components use)
color/bg/primary       → white (light) | gray/900 (dark)
color/bg/secondary     → gray/50 (light) | gray/800 (dark)
color/text/primary     → gray/900 (light) | white (dark)
color/border/default   → gray/200 (light) | gray/700 (dark)
color/interactive/primary → blue/600
spacing/component/padding-sm → spacing/2
radius/component/button → radius/md

// 02 Component (optional — unique overrides only)
button/bg/primary      → color/interactive/primary
input/border           → color/border/default
input/border-focus     → color/border/focus

// Code Syntax mapping: variable → var(--color-bg-primary)
Figma Multi-File Structure Estructura Multi-Archivo Figma
Figma Team: Shopilot Design
├── [LIB] Foundations & Tokens     ← variables, colors, typography, spacing
├── [LIB] Iconography              ← icons (changes frequently, many assets)
├── [LIB] Core Components          ← atoms + molecules
├── [LIB] Pattern Components       ← organisms + templates
├── Documentation & Playground     ← usage guides, do/don't (not published)
└── Changelog & Governance         ← change history (not published)

// Publish order: Tokens → Icons → Core Components → Patterns
// Each component file has: Cover, Getting Started, Changelog, Atoms, Molecules, Organisms, ._Base, Archive
MCP Integration — Figma + Pencil Integración MCP — Figma + Pencil
// .claude/settings.json — MCP permissions
mcp__figma__get_file            // read full Figma file
mcp__figma__get_node            // read specific component node
mcp__figma__get_design_context  // extract design context
mcp__figma__get_variable_defs   // read token variables (MUST run before implementing)
mcp__figma__get_metadata        // read component metadata (MUST run before implementing)
mcp__pencil__design             // complementary: rapid prototyping without designer

// Workflow:
// 1. Agent runs get_variable_defs + get_metadata
// 2. Reads component structure from Figma
// 3. Implements matching React component in core-product-desktop-client
// 4. Code review verifies fidelity to Figma spec
Brand Book Requirements — External Team Deliverable Requisitos del Brand Book — Entregable del Equipo Externo
// 5 mandatory deliverables from external design team:
1. Logo system: icon+wordmark, reduced icon, monochrome (white+black), SVG+PNG @1x-3x
2. Color palette: primary, secondary, semantic (success/warning/error/info), neutrals, dark mode mapping, WCAG AA ratios
3. Typography: display + body + monospace fonts, scale (10-24px), weights, files (.woff2 + .ttf)
4. Voice & tone: 3-5 brand personality attributes, context variations, "sounds like" vs "doesn't sound like"
5. 44 components with brand identity applied: all states (default, hover, active, focus, disabled), dark mode, WCAG AA

// NOT in scope: user flows, social media guidelines, corporate stationery, motion specs, code implementation
Design-to-Code Pipeline Pipeline Design-to-Code
Brand decisions (Pablo D1-D9)
    ↓
External design team creates Figma with variables + codeSyntax
    ↓
Agent runs get_variable_defs + get_metadata via Figma MCP
    ↓
Generates design-tokens.json (W3C DTCG format)
    ↓
Style Dictionary transforms → CSS :root + Tailwind config
    ↓
Agent implements React component in core-product-desktop-client (#1)
    ↓
Code review verifies fidelity: tokens + a11y + responsive + states
Current Status & Blockers Estado Actual & Bloqueadores
Status:  specs → ✓ | Figma → ✗ | design-tokens.json → ✗ | React components → ✗

Blockers:
1. 9 brand decisions (D1-D9) not yet taken — blocks entire pipeline
2. No brand assets — no logo, no licensed fonts, no brand book in repo
3. No Figma file — no source of truth for agent (requires multi-file libraries, 3 variable collections, Code Syntax, Auto Layout, Light/Dark modes)
4. No brand book — external design team has not delivered yet
Repo Structure Estructura del Repo
core-product-design-system/
├── README
├── CLAUDE.md                              ← agent instructions
├── brand/                                 ← brand assets (PENDING)
│   ├── logo/                              ← logo variants (SVG, PNG)
│   ├── fonts/                             ← licensed fonts
│   └── brand-book.md                      ← brand guidelines
├── tokens/                                ← pipeline output (PENDING)
│   └── design-tokens.json                 ← generated from Figma variables
└── .claude/
    ├── settings.json                      ← MCP permissions (figma + pencil)
    ├── memory/MEMORY.md                   ← state, decisions, catalogue
    └── specs/ (8 docs, 1930 lines)        ← architecture, contracts, dev plan, testing, figma specs
Acceptance Criteria Criterios de Aceptación
AC-18.1:  All 9 brand decisions (D1-D9) resolved and documented
AC-18.2:  Brand book delivered: logo system + color palette + typography + voice/tone + 44 components with brand applied
AC-18.3:  Figma multi-file structure: Foundations, Icons, Core Components, Patterns (not monolithic)
AC-18.4:  3 variable collections (Primitives → Semantic → Component) with Code Syntax + Light/Dark modes
AC-18.5:  All 44 components in Figma: Auto Layout, all states, Component Properties, slash naming
AC-18.6:  design-tokens.json generated → Style Dictionary → CSS :root + Tailwind config in #1
AC-18.7:  Claude reads and implements Figma components in #1 via Figma MCP (get_variable_defs + get_metadata)
AC-18.8:  WCAG 2.2 AA compliance: contrast ≥4.5:1 text, ≥3:1 UI, :focus-visible, touch target ≥44px
AC-18.9:  No React components in #1 exist outside of what is defined in the Figma
AC-18.10: Zero Figma anti-patterns: no detached instances, no hardcoded hex, no generic names, no absolute positioning

How It WorksCómo Funciona

  +---------------------------------------------------------------+
  |              BRAND DECISIONS (Pablo D1-D9)                     |
  |  D1 Emotion · D2 Color · D3 Mode · D4 Type · D5 Logo         |
  |  D6 Radius · D7 Shadows · D8 Semantic · D9 Voice              |
  +-------------------------------+-------------------------------+
                                  |
                                  v
  +---------------------------------------------------------------+
  |              EXTERNAL DESIGN TEAM                              |
  |  Brand book: logo + palette + typography + voice + 44 comps    |
  |  Figma: multi-file libraries, 3 variable collections           |
  |  Variables: Primitives → Semantic (L/D modes) → Component     |
  |  All components: Auto Layout, states, Code Syntax              |
  +-------------------------------+-------------------------------+
                                  |
                                  v
  +---------------------------------------------------------------+
  |              FIGMA MCP BRIDGE                                  |
  |  mcp__figma__get_variable_defs  → token extraction             |
  |  mcp__figma__get_metadata       → component metadata            |
  |  mcp__figma__get_node           → component structure            |
  |  mcp__pencil__design            → rapid prototyping              |
  +-------------------------------+-------------------------------+
                                  |
                                  v
  +---------------------------------------------------------------+
  |              DESIGN TOKEN PIPELINE                             |
  |  Figma Variables → design-tokens.json (W3C DTCG)               |
  |  Style Dictionary 4 → CSS :root + tailwind.config.ts            |
  |  --sp-color-brand-primary, --sp-spacing-4, --sp-radius-md      |
  +-------------------------------+-------------------------------+
                                  |
                                  v
  +---------------------------------------------------------------+
  |              NATIVE SHELL (#1) — IMPLEMENTATION                |
  |  Claude reads Figma → generates React + Tailwind components    |
  |  44 components implemented following Figma spec exactly         |
  |  Code review verifies: tokens + a11y + responsive + states     |
  +---------------------------------------------------------------+
            
Source: Fuente: New (Sec. 14 + Sec. 15 + 8 spec docs) Nuevo (Sec. 14 + Sec. 15 + 8 docs de spec) | Depends on: Depende de: Pablo D1-D9, External design team, Figma MCP Pablo D1-D9, Equipo externo de diseño, Figma MCP | Consumed by: Consumido por: #1 Shell (tokens+components), #2 Orchestrator (visual states), #13 Billing (credit UI), #3 Tool Registry (tool badges), #7 Guardrails (error A/B/C) #1 Shell (tokens+componentes), #2 Orquestador (estados visuales), #13 Billing (UI de créditos), #3 Tool Registry (tool badges), #7 Guardrails (error A/B/C)
📋 Project Changelog Changelog del Proyecto
v3 Mar 9, 2026
+Major expansion from docs 72+73: 44-component catalogue (13 atoms, 6 AI-native, 10 molecules, 16 organisms, 5 screens). 9 brand decisions (D1-D9) with options. 4 mandatory agent rules. MCP integration (Figma + Pencil). Figma multi-file architecture with 3 variable collections (Primitives → Semantic → Component). Brand book requirements (5 deliverables). Design-to-code pipeline diagram. UX Writing, data viz, AI-native patterns. Repo structure. Current blockers. Ownership: UX/UI team (executes Figma T0.BB–T4.BB) + Pablo (approves) + Sergio (consumes → React Mockups).Expansión mayor desde docs 72+73: catálogo de 44 componentes (13 átomos, 6 AI-native, 10 moléculas, 16 organismos, 5 pantallas). 9 decisiones de marca (D1-D9) con opciones. 4 reglas mandatorias del agente. Integración MCP (Figma + Pencil). Arquitectura multi-archivo Figma con 3 colecciones de variables (Primitives → Semantic → Component). Requisitos del brand book (5 entregables). Diagrama pipeline design-to-code. UX Writing, data viz, patrones AI-native. Estructura del repo. Bloqueadores actuales. Ownership: equipo UX/UI (ejecuta Figma T0.BB–T4.BB) + Pablo (aprueba) + Sergio (consume → React Mockups).
v2 Mar 9, 2026
Removed Storybook, Chromatic, React, NPM package — Design System lives entirely in Figma.Eliminados Storybook, Chromatic, React, paquete NPM — el Design System vive completamente en Figma.
+Added Atomic Design methodology. Figma MCP as consumption mechanism.Añadida metodología Atomic Design. Figma MCP como mecanismo de consumo.
v1 Mar 9, 2026
+Initial — placeholder card, external design team ownership.Inicial — card placeholder, propiedad del equipo externo de diseño.
🧠

Layer 2 — INTELLIGENCECapa 2 — INTELIGENCIA

The Coach lives hereEl Coach vive aquí

+
#2

ReAct Orchestrator

Intelligence — Mateo

ADAPT+NEW

THE HEART OF SHOPILOT. Replaces the existing one-shot RAG pipeline (ConversationFlowOrchestrator) with a stateful ReAct loop: reason, decide, act, observe — iterating until the LLM responds with text or hits MAX_ROUNDS=10. Today the Coach is one-shot: one question, one LLM call, one answer — no conversation history sent, no tools, no chaining. The orchestrator fixes all three: multi-round reasoning, tool execution with ConfirmationFlow for writes, and full conversation history in every turn. Built on TypeScript/AWS Lambda (NOT Python). The orchestrator is a port-based architecture — it doesn't know the LLM model, the transport protocol, or the billing system. IEventEmitter decouples the loop from REST (Phase 0.3) and WebSocket (Phase 4). Proactive mode (Phase 4) is a separate entry point that shares the loop but has its own context construction. Four additional capabilities strengthen the loop: (1) resilient error handling — tool errors become observations for the LLM to reason about and retry, up to 3 consecutive failures; (2) post-generation verification — the Coach contrasts cited data against retrieved data before responding; (3) extended thinking — a selective reasoning budget for complex multi-variable analysis; (4) SubtaskRunner — parallel execution of independent sub-objectives for multi-entity queries. EL CORAZON DE SHOPILOT. Reemplaza el pipeline RAG one-shot existente (ConversationFlowOrchestrator) con un loop ReAct con estado: razonar, decidir, actuar, observar — iterando hasta que el LLM responda con texto o alcance MAX_ROUNDS=10. Hoy el Coach es one-shot: una pregunta, un LLM call, una respuesta — sin historial de conversacion enviado, sin tools, sin encadenamiento. El orquestador corrige los tres: razonamiento multi-ronda, ejecucion de tools con ConfirmationFlow para escrituras, e historial completo de conversacion en cada turno. Construido sobre TypeScript/AWS Lambda (NO Python). El orquestador es una arquitectura basada en puertos — no conoce el modelo LLM, el protocolo de transporte, ni el sistema de billing. IEventEmitter desacopla el loop de REST (Fase 0.3) y WebSocket (Fase 4). El modo proactivo (Fase 4) es un entry point separado que comparte el loop pero tiene su propia construccion de contexto. Cuatro capacidades adicionales fortalecen el loop: (1) manejo resiliente de errores — los errores de tools se convierten en observaciones para que el LLM razone y reintente, hasta 3 fallos consecutivos; (2) verificacion post-generacion — el Coach contrasta datos citados contra datos recuperados antes de responder; (3) razonamiento extendido — presupuesto de razonamiento selectivo para analisis complejos multi-variable; (4) SubtaskRunner — ejecucion paralela de sub-objetivos independientes para consultas multi-entidad.

Beautonomous governance: the Orchestrator IS Core's execution engine — every ReAct loop iteration enforces Core's governance rules (Section 7.5). Every WRITE action passes through ConfirmationFlow (PENDING → CONFIRMED/REJECTED/EXPIRED). Every tool call is gated by Core's permission matrix before execution.Governance de Beautonomous: el Orquestador ES el motor de ejecución de Core — cada iteración del loop ReAct aplica las reglas de governance de Core (Sección 7.5). Cada acción WRITE pasa por el ConfirmationFlow (PENDING → CONFIRMED/REJECTED/EXPIRED). Cada tool call está controlada por la matriz de permisos de Core antes de ejecutarse.

Current State (What Exists Today)Estado Actual (Que Existe Hoy)

Reusable As-IsReutilizable Sin Cambios

  • ILLMClient + LLMClientFactory
  • ConversationTrackingOrchestrator
  • RagOrchestrator (KB provider)
  • BrandHealthContextService
  • ConversationManagementService
  • MessageRepository

Needs ImprovementNecesita Mejora

  • ConversationFlowOrchestrator: no history sent to LLMConversationFlowOrchestrator: no envia historial al LLM
  • ILLMClient.generateAnswer(): RAG-specific signature — new generate(messages, tools) method is a shared deliverable with #3ILLMClient.generateAnswer(): firma especifica de RAG — nuevo metodo generate(messages, tools) es entregable compartido con #3
  • COACH_SYSTEM_PROMPT: hardcoded constantCOACH_SYSTEM_PROMPT: constante hardcodeada

Missing (To Build)Falta (Por Construir)

  • ReActOrchestrator
  • OrchestrationSession
  • IContextWindowManager
  • IOrchestrationRepository
  • IEventEmitter + adapters
  • ConfirmationFlow (Phase 1)
  • ISystemPromptComposer (Phase 1)
  • ToolErrorRecovery (Phase 1)
  • ResponseVerifier (Phase 2)
  • ExtendedThinking (Phase 2)
  • SubtaskRunner (Phase 4)
ReActOrchestrator
Reason/decide/act/observe (MAX_ROUNDS=10)Razonar/decidir/actuar/observar (MAX_ROUNDS=10)
OrchestrationSession
ULID, stateful domain modelULID, modelo de dominio con estado
IEventEmitter
REST (0.3) / WebSocket (Ph4)REST (0.3) / WebSocket (F4)
ContextWindowManager
Compaction at 92% + truncationCompactacion al 92% + truncacion
ConfirmationFlow
DynamoDB persist, 30min TTL (Ph1)Persistencia DynamoDB, TTL 30min (F1)
SystemPromptComposer
3-layer + cache_control (Ph1)3 capas + cache_control (F1)
ToolErrorRecovery
Error → observation, 3 retry limit (Ph1)Error → observacion, limite 3 reintentos (F1)
ResponseVerifier
Hallucination detection post-LLM (Ph2)Deteccion de alucinaciones post-LLM (F2)
ExtendedThinking
Selective reasoning budget (Ph2)Presupuesto de razonamiento selectivo (F2)
SubtaskRunner
Parallel sub-objectives (Ph4)Sub-objetivos en paralelo (F4)

Tech Stack (TypeScript / AWS Lambda)Stack Tecnologico (TypeScript / AWS Lambda)

TypeScript AWS Lambda (60s timeout) API Gateway v2 DynamoDB (session TTL) Anthropic Messages API ULID
Data Models, Interfaces & Acceptance Criteria Modelos de Datos, Interfaces & Criterios de Aceptación
Data Models (TypeScript)Modelos de Datos (TypeScript)
// OrchestrationSession — stateful domain model
interface OrchestrationSession {
  sessionId: string             // ULID
  conversationId: string        // DynamoDB conversation ID
  userId: string                // Memberstack ID
  marketplace: Marketplace

  status: 'idle' | 'running' | 'awaiting_confirmation' | 'done' | 'error'
  currentRound: number          // 0..MAX_ROUNDS

  messages: ConversationMessage[]
  // { role: 'user', content: string }
  // { role: 'assistant', content: ContentBlock[] }  // may include tool_use
  // { role: 'user', content: ToolResultBlock[] }

  pendingConfirmation: ConfirmationRequest | null
  // { toolName, toolArgs, riskLevel: 'reversible'|'irreversible',
  //   preview: { before, after }, expiresAt: Date (30min) }

  costAccumulator: { totalTokens: number, toolCallCount: number }
}

// RoundDecision — output of evaluating each LLM response
type RoundDecision =
  | 'respond'         // text only -> emit response, end loop
  | 'execute_tool'    // tool_use blocks -> execute and observe
  | 'await_confirm'   // write tool -> pause loop, persist session
  | 'max_rounds'      // round 10 -> force final response
  | 'cost_guard'      // >50K tokens -> force final response
Core Interfaces (Ports)Interfaces Core (Puertos)
// IReActOrchestrator — conversational entry point
interface IReActOrchestrator {
  handleMessage(params: {
    userId: string; conversationId: string;
    content: string; marketplace: Marketplace;
  }): Promise<OrchestrationResult>

  resumeAfterConfirmation(params: {
    sessionId: string; confirmed: boolean;
  }): Promise<OrchestrationResult>
}

// IEventEmitter — decouples loop from transport
type OrchestrationEvent =
  | { type: 'text_chunk'; content: string }
  | { type: 'tool_start'; toolName: string; args: unknown }
  | { type: 'tool_result'; toolName: string; result: unknown; latencyMs: number }
  | { type: 'confirmation_required'; preview: ConfirmationPreview }
  | { type: 'end_turn'; summary: OrchestrationSummary }
  | { type: 'error'; message: string }
// Phase 0.3: RestResponseEventEmitter (accumulate, return at end)
// Phase 4:   WebSocketEventEmitter (real-time streaming)

// OrchestrationSummary — included in end_turn event
interface OrchestrationSummary {
  rounds: number
  tokensUsed: number
  actionsExecuted: ActionSummary[]   // WRITE tools executed this turn
  confirmationsRequested: number
  latencyMs: number
}

// ActionSummary — one WRITE action performed by the Coach
interface ActionSummary {
  toolName: string                   // e.g. 'update_product_content'
  sku: string
  fieldChanged: string               // e.g. 'title'
  previousValue: unknown             // from snapshot_product
  newValue: unknown
  success: boolean
}
// Enables: Shell shows seller what changed, Feedback Loop (#15) measures impact

// IContextWindowManager — compaction + truncation
interface IContextWindowManager {
  guard(messages: ConversationMessage[], modelTokenLimit: number): Promise<ConversationMessage[]>
  truncateToolResult(result: unknown, maxTokens: number): unknown
}
// Triggers at 92% of model limit. Preserves last 10 messages + active round tool_results.

// IOrchestrationRepository — session persistence for ConfirmationFlow
// DynamoDB with 35min TTL. Only used during PAUSE/RESUME.

// IOrchestrationTracer — fire-and-forget observability
// Reuses ConversationTrackingOrchestrator. If PostgreSQL fails, loop continues.
Acceptance CriteriaCriterios de Aceptación
  • [Ph 0.3] Conversation history sent to LLM in every turn (fix: data exists in DynamoDB, just load and pass)
  • [Ph 0.3] ReAct loop active with MAX_ROUNDS=10 guard + cost guard (50K tokens)
  • [Ph 0.3] Context compaction triggers at 92% of model limit, preserves last 10 messages
  • [Ph 0.3] OrchestrationSession persisted in DynamoDB with 35min TTL
  • [Ph 0.3] REST response via RestResponseEventEmitter (accumulate + return)
  • [Ph 1] ConfirmationFlow: pause/serialize/confirm/reject/timeout works end-to-end
  • [Ph 1] Tool loop detection: same tool+args in 2 consecutive rounds is blocked
  • [Ph 1] Prompt caching with cache_control reduces input token cost 60-80%
  • [Ph 4] WebSocket streaming with text_chunk events in real-time
  • [Ph 1] Tool error recovery: errors become observations, LLM retries with corrected args. Max 3 consecutive failures on same tool → respond with available info
  • [Ph 2] ResponseVerifier: post-generation check contrasts cited data (fees, metrics, prices) against retrieved data. Discrepancy → clarification or regeneration
  • [Ph 2] Extended thinking: selective activation for complex multi-variable analysis (e.g., sales diagnosis across price, positioning, competitors, seasonality). Not active for simple queries
  • [Ph 4] SubtaskRunner: independent sub-objectives executed in parallel (e.g., "analyze my top 3 products" → 3 parallel analyses → merged response)
  • [Ph 1] end_turn event includes actionsExecuted[] enumerating every WRITE tool result from the turn (toolName, SKU, fieldChanged, previousValue, newValue, success)
  • [F 0.3] Historial de conversacion enviado al LLM en cada turno (fix: dato existe en DynamoDB, solo cargar y pasar)
  • [F 0.3] Loop ReAct activo con guard MAX_ROUNDS=10 + cost guard (50K tokens)
  • [F 0.3] Compactacion de contexto se dispara al 92% del limite del modelo, preserva ultimos 10 mensajes
  • [F 0.3] OrchestrationSession persistida en DynamoDB con TTL de 35min
  • [F 0.3] Respuesta REST via RestResponseEventEmitter (acumular + retornar)
  • [F 1] ConfirmationFlow: pause/serializar/confirmar/rechazar/timeout funciona end-to-end
  • [F 1] Deteccion de loop de tools: misma tool+args en 2 rondas consecutivas se bloquea
  • [F 1] Prompt caching con cache_control reduce costo de tokens de entrada 60-80%
  • [F 4] WebSocket streaming con eventos text_chunk en tiempo real
  • [F 1] Recuperacion de errores de tools: errores se convierten en observaciones, LLM reintenta con args corregidos. Max 3 fallos consecutivos en misma tool → responder con info disponible
  • [F 2] ResponseVerifier: verificacion post-generacion contrasta datos citados (fees, metricas, precios) contra datos recuperados. Discrepancia → aclaracion o regeneracion
  • [F 2] Razonamiento extendido: activacion selectiva para analisis complejos multi-variable (ej: diagnostico de ventas cruzando precio, posicionamiento, competidores, temporada). No se activa para consultas simples
  • [F 4] SubtaskRunner: sub-objetivos independientes ejecutados en paralelo (ej: "analiza mis 3 productos mas vendidos" → 3 analisis en paralelo → respuesta fusionada)
  • [F 1] Evento end_turn incluye actionsExecuted[] listando cada resultado de WRITE tool del turno (toolName, SKU, fieldChanged, previousValue, newValue, success)

MAX_ROUNDS: 10 · Cost guard: 50K tokens · Compaction: 92% · Truncation: 4K tokens · Confirmation TTL: 35min · Cache hit L1: >95%

How It Works — State MachineComo Funciona — Maquina de Estados

(user message arrives via REST / WebSocket)
        |
        v
PREPARE
  +-- SystemPromptComposer.compose(marketplace, userProfile, tools, health)
  +-- ContextWindowManager.guard(messages)   <- compaction if >92%
  +-- messages.push({ role: 'user', content })
        |
        v (currentRound++)
REASON  <- ILLMClient.generate(messages, toolDefinitions)
        |
        |-- text only? ----------------------------------------> RESPOND -> DONE
        |
        |-- tool_use blocks? ----------------------------------------
        |                                                           |
        |  ToolPolicyFilter.check(toolName, userId, plan)           |
        |   +-- denied    -> append tool_result(error) -> next round|
        |   +-- read_only -> EXECUTE -> append result -> next round |
        |   +-- write     -> AWAIT_CONFIRMATION                     |
        |                        |                                  |
        |             serialize OrchestrationSession to DynamoDB    |
        |             IEventEmitter.emit(confirmation_required)     |
        |             (Lambda terminates, awaits reconnection)      |
        |                        |                                  |
        |              confirmed | rejected | timeout(30min)        |
        |                   |         |          |                  |
        |                EXECUTE  append msg    DONE                |
        |                   |         |                             |
        |            append result  next round                      |
        |                   |                                       |
        +-------------------+---------------------------------------
        |
        |-- currentRound == MAX_ROUNDS? ----------------> force RESPOND
        |-- costAccumulator.totalTokens > 50K? ---------> force RESPOND
        |
        v
RESPOND
  +-- IEventEmitter.emit({ type: 'end_turn', ... })
  +-- IOrchestrationTracer.completeExecution()
  +-- MessageRepository.save(assistantMessage)
  +-- ConversationRepository.update()
        |
        v
DONE

The ReAct Orchestrator replaces the one-shot ConversationFlowOrchestrator. Today: HandleConversationQueryUseCase -> ConversationFlowOrchestrator runs a fixed pipeline (embedding -> vector search -> brand health -> single LLM call -> save). No loop, no tools, no history sent. The new orchestrator is a state machine: PREPARE builds context, REASON calls the LLM, DECIDE evaluates the output (text-only, tool_use, max_rounds, cost_guard), ACT executes tools or pauses for confirmation, OBSERVE appends results and loops. Each dependency is a port: ILLMClient (exists), IToolExecutor (Phase 1), ISystemPromptComposer (Phase 1), IContextWindowManager (new), IOrchestrationRepository (new), IEventEmitter (new), ICreditsGate (Phase 1), IOrchestrationTracer (reused). The orchestrator doesn't know which LLM model it uses, how events reach the client, or how credits are calculated.El Orquestador ReAct reemplaza el ConversationFlowOrchestrator one-shot. Hoy: HandleConversationQueryUseCase -> ConversationFlowOrchestrator ejecuta un pipeline fijo (embedding -> busqueda vectorial -> brand health -> un solo LLM call -> guardar). Sin loop, sin tools, sin historial enviado. El nuevo orquestador es una maquina de estados: PREPARE construye contexto, REASON llama al LLM, DECIDE evalua el output (solo-texto, tool_use, max_rounds, cost_guard), ACT ejecuta tools o pausa para confirmacion, OBSERVE agrega resultados y repite. Cada dependencia es un puerto: ILLMClient (existe), IToolExecutor (Fase 1), ISystemPromptComposer (Fase 1), IContextWindowManager (nuevo), IOrchestrationRepository (nuevo), IEventEmitter (nuevo), ICreditsGate (Fase 1), IOrchestrationTracer (reutilizado). El orquestador no sabe que modelo LLM usa, como los eventos llegan al cliente, ni como se calculan los creditos.

Scope Boundaries — What This Layer Does NOT OwnLimites de Alcance — Lo Que Esta Capa NO Controla

ToolRegistry / ToolDispatcher

Owned by Tool Registry (#3). Orchestrator calls IToolExecutor.execute() only.Pertenece a Tool Registry (#3). El orquestador solo llama IToolExecutor.execute().

Credit VerificationVerificacion de Creditos

Owned by Billing & Credit Economy (#13). Orchestrator calls ICreditsGate.canProceed().Pertenece a Billing & Credit Economy (#13). El orquestador llama ICreditsGate.canProceed().

TraceabilityTrazabilidad

Owned by Observability (#8). Orchestrator notifies IOrchestrationTracer (fire-and-forget).Pertenece a Observability (#8). El orquestador notifica IOrchestrationTracer (fire-and-forget).

WebSocket / SSE

Transport adapters via IEventEmitter. Changing transport doesn't touch the loop.Adapters de transporte via IEventEmitter. Cambiar transporte no toca el loop.

Context Aggregator

Owned by #5. Orchestrator calls IContextAssembler for user prompt context (KB + Brand Health RAG).Pertenece a #5. El orquestador llama IContextAssembler para contexto del user prompt (KB + Brand Health RAG).

Prompt ContentContenido del Prompt

Layer 1 content (personality, tone) is config, not orchestrator logic.El contenido de Layer 1 (personalidad, tono) es configuracion, no logica del orquestador.

Implementation Plan (Phased)Plan de Implementacion (Por Fases)

Phase 0.3: ReAct Loop WITHOUT Tools (Immediate Priority)Fase 0.3: Loop ReAct SIN Tools (Prioridad Inmediata)

ReActOrchestrator replaces ConversationFlowOrchestrator as main orchestrator. OrchestrationSession domain model with ULID. Conversation history loaded from DynamoDB and sent to LLM on every turn (critical fix — data exists, just load and pass). IContextWindowManager: compaction at 92% model limit, preserves last 10 messages + active round tool_results. IOrchestrationRepository + DynamoDB adapter (35min TTL). RestResponseEventEmitter: accumulates events, returns complete response at end via REST. MAX_ROUNDS guard + cost guard (50K tokens). Even without tools, the LLM can reason in multiple rounds over available context.ReActOrchestrator reemplaza ConversationFlowOrchestrator como orquestador principal. Modelo de dominio OrchestrationSession con ULID. Historial de conversacion cargado de DynamoDB y enviado al LLM en cada turno (fix critico — el dato existe, solo cargarlo y pasarlo). IContextWindowManager: compactacion al 92% del limite del modelo, preserva ultimos 10 mensajes + tool_results del round activo. IOrchestrationRepository + adaptador DynamoDB (TTL 35min). RestResponseEventEmitter: acumula eventos, retorna respuesta completa al final via REST. Guard MAX_ROUNDS + cost guard (50K tokens). Aun sin tools, el LLM puede razonar en multiples rondas sobre el contexto disponible.

Phase 1: ConfirmationFlow + Tool IntegrationFase 1: ConfirmationFlow + Integracion de Tools

ConfirmationFlow state machine: PENDING -> CONFIRMED / REJECTED / EXPIRED. Session serialized to DynamoDB before pause, restored on confirm. ISystemPromptComposer with 3 layers: L1 BaseCoachBlock (~500 tok, >95% cache hit), L2 SessionBlock (~150-300 tok, ~70% cache hit), L3 ExecutionBlock (~100-150 tok, ~90% cache hit). Integration with IToolExecutor from ToolRegistry (#3). Prompt caching with Anthropic cache_control headers. ICreditsGate: verify quota before each tool call. Tool loop detection: same tool+args in 2 consecutive rounds -> block and notify. ToolErrorRecovery: when a tool returns an error, it becomes an observation in the ReAct loop — the LLM sees the error, reasons about what went wrong, and can retry with corrected arguments or call a different tool. Max 3 consecutive failures on the same tool → Coach responds with available information. InputGuard integration: IGuardService.validateInput() runs before the loop; IGuardService.validateOutput() runs after (#7).Maquina de estados ConfirmationFlow: PENDING -> CONFIRMED / REJECTED / EXPIRED. Sesion serializada a DynamoDB antes de pausar, restaurada al confirmar. ISystemPromptComposer con 3 capas: L1 BaseCoachBlock (~500 tok, >95% cache hit), L2 SessionBlock (~150-300 tok, ~70% cache hit), L3 ExecutionBlock (~100-150 tok, ~90% cache hit). Integracion con IToolExecutor del ToolRegistry (#3). Prompt caching con headers cache_control de Anthropic. ICreditsGate: verificar cuota antes de cada tool call. Deteccion de loop de tools: misma tool+args en 2 rondas consecutivas -> bloquear y notificar. ToolErrorRecovery: cuando una tool retorna un error, se convierte en observacion en el loop ReAct — el LLM ve el error, razona sobre que salio mal, y puede reintentar con argumentos corregidos o llamar a otra tool. Max 3 fallos consecutivos en la misma tool → el Coach responde con la informacion disponible. Integracion con InputGuard: IGuardService.validateInput() corre antes del loop; IGuardService.validateOutput() corre despues (#7).

Phase 2: Verification + Extended ThinkingFase 2: Verificacion + Razonamiento Extendido

ResponseVerifier: post-generation step that contrasts cited data (fees, metrics, prices) against retrieved data from tools and KB. If discrepancy detected → adds clarification or regenerates with instruction to use correct data. User only sees the final verified response. Extended thinking: for complex multi-variable analysis, the orchestrator assigns an extended reasoning budget to the LLM. Activation is selective — simple queries ("how much stock?") don't trigger it, complex diagnostics (sales analysis across price, positioning, competitors, seasonality) do. The thinking process is not visible to the user.ResponseVerifier: paso post-generacion que contrasta datos citados (fees, metricas, precios) contra datos recuperados de tools y KB. Si detecta discrepancia → agrega aclaracion o regenera con instruccion de usar el dato correcto. El usuario solo ve la respuesta final verificada. Razonamiento extendido: para analisis complejos multi-variable, el orquestador asigna un presupuesto de razonamiento extendido al LLM. La activacion es selectiva — consultas simples ("¿cuanto stock tengo?") no lo activan, diagnosticos complejos (analisis de ventas cruzando precio, posicionamiento, competidores, temporada) si. El proceso de pensamiento no es visible para el usuario.

Phase 4: Streaming + Proactive Mode + SubtaskRunnerFase 4: Streaming + Modo Proactivo + SubtaskRunner

WebSocketEventEmitter replaces RestResponseEventEmitter. Real-time streaming: text_chunk events while LLM generates. Session restoration on WebSocket reconnection. ProactiveMode: separate entry point via EventBridge Scheduler, different context construction (no conversation history), output via push notification. Shares the ReAct loop and ILLMClient but has its own context builder and completion condition. SubtaskRunner: when the user asks a query with multiple independent parts ("analyze my top 3 products"), the orchestrator decomposes into parallel sub-tasks. Each sub-task has its own reasoning loop and tools. Results are merged into a unified response. Only activates when sub-objectives are truly independent (product A analysis doesn't affect product B analysis). Same consolidated response — just faster.WebSocketEventEmitter reemplaza RestResponseEventEmitter. Streaming en tiempo real: eventos text_chunk mientras el LLM genera. Restauracion de sesion en reconexion WebSocket. ProactiveMode: entry point separado via EventBridge Scheduler, construccion de contexto diferente (sin historial de conversacion), salida via push notification. Comparte el loop ReAct e ILLMClient pero tiene su propio constructor de contexto y condicion de finalizacion. SubtaskRunner: cuando el usuario hace una consulta con multiples partes independientes ("analiza mis 3 productos mas vendidos"), el orquestador descompone en sub-tareas paralelas. Cada sub-tarea tiene su propio loop de razonamiento y tools. Los resultados se fusionan en una respuesta unificada. Solo se activa cuando los sub-objetivos son verdaderamente independientes (el analisis del producto A no afecta al del producto B). Misma respuesta consolidada — solo mas rapida.

Risk AnalysisAnalisis de Riesgos

LLM Infinite LoopLoop Infinito del LLM

Impact: HighImpacto: Alto

Mitigation: MAX_ROUNDS=10 + system message forcing text-only response without toolDefinitions. Cost guard at 50K tokens. Tool loop detection: same tool+args in consecutive rounds -> block with "no new information available" message. D4: if tasks regularly need 10 rounds, the problem is tool design, not the limit.Mitigacion: MAX_ROUNDS=10 + mensaje de sistema forzando respuesta solo-texto sin toolDefinitions. Cost guard a 50K tokens. Deteccion de loop de tools: misma tool+args en rondas consecutivas -> bloquear con mensaje "no hay nueva informacion disponible". D4: si tareas regularmente necesitan 10 rondas, el problema es el diseno de tools, no el limite.

ConfirmationFlow State CorruptionCorrupcion de Estado en ConfirmationFlow

Impact: HighImpacto: Alto

Mitigation: OrchestrationSession fully serialized to DynamoDB before pausing. TTL 35min auto-expires stale sessions. On resume, verify consistency before executing tool. Resilient to hot deploys during confirmation wait.Mitigacion: OrchestrationSession serializada completamente a DynamoDB antes de pausar. TTL 35min auto-expira sesiones obsoletas. Al reanudar, verificar consistencia antes de ejecutar tool. Resistente a deploys en caliente durante espera de confirmacion.

Context Window OverflowDesbordamiento de Ventana de Contexto

Impact: MediumImpacto: Medio

Mitigation: ContextWindowManager.guard() triggers compaction at 92% before LLM call. Tool results >4000 tokens truncated to summary. Prompt caching on L1-L2 reduces size of static layers. Silent LLM truncation is the real danger — guard prevents it.Mitigacion: ContextWindowManager.guard() dispara compactacion al 92% antes del LLM call. Resultados de tools >4000 tokens truncados a resumen. Prompt caching en L1-L2 reduce tamano de capas estaticas. La truncacion silenciosa del LLM es el peligro real — el guard la previene.

Lambda Timeout in Long LoopsTimeout de Lambda en Loops Largos

Impact: MediumImpacto: Medio

Mitigation: Lambda configured at 60s. Each tool has internal 10s timeout. A 5-round loop with marketplace reads can exceed 60s. In Phase 4 WebSocket eliminates Lambda timeout as UX limiter. Orphan sessions (status='running', startedAt >65s) detectable — client retries using saved history.Mitigacion: Lambda configurada a 60s. Cada tool tiene timeout interno de 10s. Un loop de 5 rondas con lecturas de marketplace puede exceder 60s. En Fase 4 WebSocket elimina el timeout de Lambda como limitante de UX. Sesiones huerfanas (status='running', startedAt >65s) detectables — el cliente reintenta usando historial guardado.

Key DecisionsDecisiones Clave

D1.

Stateful orchestrator, not a procedural pipeline — The current pipeline is a function: fixed input, fixed steps, fixed output. The ReAct orchestrator is a state machine: the LLM decides what to do next based on history and available tools. This cannot be patched on top of ConversationFlowOrchestrator — it requires redesigning the orchestration entry point.Orquestador con estado, no pipeline procedural — El pipeline actual es una funcion: entrada fija, pasos fijos, salida fija. El orquestador ReAct es una maquina de estados: el LLM decide que hacer a continuacion basado en el historial y las herramientas disponibles. Esto no se puede parchar encima del ConversationFlowOrchestrator — requiere redisenar el entry point de orquestacion.

D2.

IEventEmitter separates orchestration from transport — The orchestrator emits events without knowing how they reach the client. Phase 0.3: RestResponseEventEmitter accumulates and responds at end. Phase 4: WebSocketEventEmitter streams in real-time. Changing transport doesn't touch the loop.IEventEmitter separa orquestacion de transporte — El orquestador emite eventos sin saber como llegan al cliente. Fase 0.3: RestResponseEventEmitter acumula y responde al final. Fase 4: WebSocketEventEmitter hace streaming en tiempo real. Cambiar el transporte no toca el loop.

D3.

ConfirmationFlow persists in DynamoDB, not in memory — A user confirmation can take minutes. Lambda cannot stay alive waiting. Session serialized before pause, restored on confirm. Also resilient to hot deploys during the wait.ConfirmationFlow persiste en DynamoDB, no en memoria — Una confirmacion de usuario puede tardar minutos. Lambda no puede mantenerse viva esperando. Sesion serializada antes de pausar, restaurada al confirmar. Tambien resistente a deploys en caliente durante la espera.

D4.

MAX_ROUNDS=10 is a fail-safe, not a design limit — System prompt and tools should be designed so the Coach resolves tasks in 3-5 rounds max. If a task regularly needs 10 rounds, the problem is tool design, not the limit.MAX_ROUNDS=10 es un fail-safe, no un limite de diseno — El system prompt y las tools deben estar disenados para que el Coach resuelva tareas en 3-5 rondas maximo. Si una tarea regularmente necesita 10 rondas, el problema es el diseno de tools, no el limite.

D5.

RagOrchestrator survives as KB context provider — The existing RAG pipeline doesn't disappear — it becomes a service invoked by the orchestrator to enrich context before the first LLM call. BrandHealthContextService remains as seller health provider. KB and brand health go from fixed pipeline steps to inputs for context composition.RagOrchestrator subsiste como proveedor de contexto KB — El pipeline RAG existente no desaparece — se convierte en un servicio invocado por el orquestador para enriquecer contexto antes del primer LLM call. BrandHealthContextService se mantiene como proveedor de salud del vendedor. KB y brand health pasan de ser pasos fijos del pipeline a ser insumos de la composicion del contexto.

Dependency Map (Ports)Mapa de Dependencias (Puertos)

ReActOrchestrator
+-- ILLMClient                  <- LLMClientFactory (exists)
+-- IToolExecutor               <- ToolRegistry (Phase 1 — #3)
+-- ISystemPromptComposer       <- new (Phase 1 — requires UserProfile)
+-- IContextWindowManager       <- new (Phase 0.3)
+-- IOrchestrationRepository    <- new + DynamoDB adapter (Phase 0.3)
+-- IEventEmitter               <- RestResponseEventEmitter (Phase 0.3)
|                                  WebSocketEventEmitter (Phase 4)
+-- ICreditsGate                <- new interface over billing (Phase 1 — #13)
+-- IOrchestrationTracer        <- ConversationTrackingOrchestrator (exists)
+-- IContextAssembler           <- RagOrchestrator (Phase 0.2 — #5)
+-- IGuardService               <- GuardService (Phase 1 — #7)

MVP Scope

Phase 0.3: ReAct loop with conversation history, context compaction, MAX_ROUNDS + cost guard, REST response (no streaming). Phase 1: ConfirmationFlow + tools + prompt caching. Proactive mode deferred to Phase 4. Fase 0.3: Loop ReAct con historial de conversacion, compactacion de contexto, MAX_ROUNDS + cost guard, respuesta REST (sin streaming). Fase 1: ConfirmationFlow + tools + prompt caching. Modo proactivo diferido a Fase 4.

Inspired byInspirado en

ConversationFlowOrchestrator (existing pipeline). HandleConversationQueryUseCase (entry point). Claude Code port-based architecture pattern. ConversationFlowOrchestrator (pipeline existente). HandleConversationQueryUseCase (entry point). Patron de arquitectura basada en puertos de Claude Code.

Source:Fuente: ConversationFlowOrchestrator + HandleConversationQueryUseCase | Depends on:Depende de: #3 (IToolExecutor), #4 (L1 prompt), #13 (ICreditsGate), #5 (RAG context), #8 (IOrchestrationTracer), #7 (IGuardService)
📝 Project ChangelogChangelog del Proyecto
v4 Mar 2, 2026
~Renamed “Autonomous Agent Orchestrator” → “ReAct Orchestrator”Renombrado “Autonomous Agent Orchestrator” → “ReAct Orchestrator”
~Stack migrated Python/asyncio → TypeScript/AWS LambdaStack migrado de Python/asyncio → TypeScript/AWS Lambda
~SystemPromptComposer 4-layer → 3-layer (L1 Base, L2 Session, L3 Execution)SystemPromptComposer 4-layer → 3-layer (L1 Base, L2 Session, L3 Execution)
+Port-based architecture with 7 interfaces (IToolExecutor, ILLMClient, IContextProvider, etc.)Arquitectura basada en puertos con 7 interfaces (IToolExecutor, ILLMClient, IContextProvider, etc.)
+Current State section: Reusable / Needs Improvement / MissingSeccion Estado Actual: Reusable / Necesita Mejora / Faltante
+4 new concepts: ToolErrorRecovery (3 retries), ResponseVerifier, ExtendedThinking, SubtaskRunner4 conceptos nuevos: ToolErrorRecovery (3 reintentos), ResponseVerifier, ExtendedThinking, SubtaskRunner
+OrchestrationSummary + ActionSummary: actionsExecuted[] per turnOrchestrationSummary + ActionSummary: actionsExecuted[] por turno
~Data models: Python dataclasses → TypeScript interfacesModelos de datos: Python dataclasses → TypeScript interfaces
+Dependency on #7 (IGuardService) addedDependencia de #7 (IGuardService) agregada
v3 Feb 27-28, 2026
+Full deep spec card as “Autonomous Agent Orchestrator” — Python/asyncio, 4-layer SystemPromptComposerCard deep spec completa como “Autonomous Agent Orchestrator” — Python/asyncio, 4-layer SystemPromptComposer
+ReAct loop, confirmation flow, tool orchestration documentedLoop ReAct, flujo de confirmacion, orquestacion de tools documentados
v2 Feb 27, 2026
~Confirmation timeout 5min → 30min (configurable), reminder at 5minTimeout de confirmacion 5min → 30min (configurable), reminder a los 5min
~MAX_ROUNDS 15 → 10MAX_ROUNDS 15 → 10
Credit verification removed from main loop (tracking only)Verificacion de creditos removida del loop principal (solo tracking)
v1 Feb 26, 2026
+Initial — “Autonomous Agent Orchestrator”, ADAPT statusInicial — “Autonomous Agent Orchestrator”, estado ADAPT
#3

Tool Registry & Policy Engine

Tools — Mateo

NEW

The agent's hands. 36 primitive tools that the autonomous agent composes into open-ended workflows at runtime via the ReAct loop. Each tool is a ToolDefinition (name, inputSchema, riskLevel, marketplace, category, phase, estimatedTokens) registered in the ToolRegistry at Lambda startup. The orchestrator only sees IToolExecutor — it never touches the registry, policies, or handlers directly. ToolPolicyFilter gates access by marketplace and risk level. HookLifecycle (before_tool → execute → after_tool) captures observability, triggers proactive suggestions, and logs WRITE actions for impact measurement (#15). SubtaskRunner executes multiple read_only tools in parallel via Promise.all. SessionResultCache prevents redundant API calls within a session — if the Coach calls the same tool with the same arguments twice in the same loop, the second call returns the cached result instead of hitting the external API. Only applies to READ and ANALYSIS tools — WRITE tools are never cached because their effects may change system state. No separate "Skills" layer — the LLM composes tools directly. Las manos del agente. 36 herramientas primitivas que el agente autonomo compone en flujos abiertos en tiempo de ejecucion via el loop ReAct. Cada herramienta es una ToolDefinition (name, inputSchema, riskLevel, marketplace, category, phase, estimatedTokens) registrada en el ToolRegistry al arrancar Lambda. El orquestador solo ve IToolExecutor — nunca toca el registry, las politicas, ni los handlers directamente. ToolPolicyFilter controla acceso por marketplace y nivel de riesgo. HookLifecycle (before_tool → execute → after_tool) captura observabilidad, dispara sugerencias proactivas, y registra acciones WRITE para medicion de impacto (#15). SubtaskRunner ejecuta multiples tools read_only en paralelo via Promise.all. SessionResultCache evita llamadas redundantes a APIs dentro de una sesion — si el Coach llama a la misma tool con los mismos argumentos dos veces en el mismo loop, la segunda llamada retorna el resultado cacheado en vez de llamar a la API externa. Solo aplica a tools READ y ANALYSIS — tools WRITE nunca se cachean porque sus efectos pueden cambiar el estado del sistema. No hay capa de "Skills" separada — el LLM compone las tools directamente.

Beautonomous governance: ToolPolicyFilter implements Core's risk taxonomy and permission matrix — WRITE tools require role confirmation via ConfirmationFlow before execution. HookLifecycle is the enforcement point where Core's permission gates are invoked on every tool call.Governance de Beautonomous: ToolPolicyFilter implementa la taxonomía de riesgo y la matriz de permisos de Core — las WRITE tools requieren confirmación de rol vía ConfirmationFlow antes de ejecutarse. HookLifecycle es el punto de aplicación donde los gates de permisos de Core se invocan en cada tool call.

Current StateEstado Actual

ReusableReutilizable

ILLMClient + LLMClientFactory (extensible for tool_use), AnthropicClient (operational, needs tool_use support), ConversationTrackingOrchestrator (reusable as IOrchestrationTracer for ObservabilityHook), BrandHealthContextService (automatic context, unaffected by tools layer)ILLMClient + LLMClientFactory (extensible para tool_use), AnthropicClient (operacional, necesita soporte tool_use), ConversationTrackingOrchestrator (reutilizable como IOrchestrationTracer para ObservabilityHook), BrandHealthContextService (contexto automatico, no afectado por capa de tools)

Needs ImprovementNecesita Mejora

ILLMClient.generateAnswer(query, chunks) — RAG-specific signature incompatible with tool_use. Needs new generate(messages, tools) → ContentBlock[] method. AnthropicClient.chat() doesn't send tools[] or parse tool_use blocks.ILLMClient.generateAnswer(query, chunks) — firma RAG-especifica incompatible con tool_use. Necesita nuevo metodo generate(messages, tools) → ContentBlock[]. AnthropicClient.chat() no envia tools[] ni parsea bloques tool_use.

To BuildPor Construir

ToolDefinition domain types, IToolExecutor, IToolRegistry + ToolRegistry, IToolPolicyFilter + ToolPolicyFilter, IToolHook + ObservabilityHook, ToolExecutor with HookLifecycle + SubtaskRunner, SessionResultCache (per-session dedup for READ/ANALYSIS), all 36 tool handlers across 5 phases.ToolDefinition domain types, IToolExecutor, IToolRegistry + ToolRegistry, IToolPolicyFilter + ToolPolicyFilter, IToolHook + ObservabilityHook, ToolExecutor con HookLifecycle + SubtaskRunner, SessionResultCache (dedup por sesion para READ/ANALYSIS), los 36 handlers de tools en 5 fases.

READ (8)
Tool DescriptionDescripción Risk
search_market_productsSearch similar products in the marketplace. fields param controls result tokensBusca productos similares en el marketplace. Param fields controla tokens del resultadoread_only
get_competitor_productFull listing of a competitor product by ID or URLListing completo de un producto competidor por ID o URLread_only
get_market_pricingMarket price range for a product or categoryRango de precios del mercado para un producto o categoríaread_only
get_keyword_dataSearch volume and competition for keywordsVolumen de búsqueda y competencia de palabras claveread_only
analyze_product_imageEvaluates a product image against marketplace standardsEvalúa una imagen de producto contra los estándares del marketplaceread_only
enhance_product_imageEnhances an existing product photo — does not generate from scratchMejora una foto de producto existente — no genera desde ceroread_only
analyze_product_videoTechnical checklist: duration, quality, guideline complianceChecklist técnica: duración, calidad, cumplimiento de guidelinesread_only
get_product_fee_estimateFee and cost estimation for a given product and priceEstimación de comisiones y costos para un producto y precio dadosread_only
WRITE (13)
update_product_content, publish_product, update_product_images, update_product_video, update_price, update_stock, pause_product, activate_product, close_product, answer_question, hide_question, send_buyer_message, request_review
SYSTEM (1) + ADS (4)
update_user_profile | create_campaign, update_campaign, pause_campaign, activate_campaign
IToolExecutor
Single port for orchestratorPuerto unico para orquestador
ToolPolicyFilter
Marketplace + risk gatesGates marketplace + riesgo
HookLifecycle
before → execute → afterbefore → execute → after
SubtaskRunner
Promise.all for read_onlyPromise.all para read_only
SessionResultCache
Same tool+args → cached (READ/ANALYSIS only)Misma tool+args → cacheado (solo READ/ANALYSIS)
FeedbackCaptureHook
WRITE only — snapshot before + log after → #15Solo WRITE — snapshot antes + log despues → #15

Scope Boundaries — What this layer does NOT doLimites de Alcance — Lo que esta capa NO hace

Marketplace HTTP clients — Handlers call marketplace adapters (#12), but the API clients are infrastructure, not part of ToolExecutor.Clientes HTTP de marketplace — Los handlers llaman adaptadores de marketplace (#12), pero los API clients son infraestructura, no parte del ToolExecutor.
Credit verification/debit — Orchestrator (#2) calls ICreditsGate. ToolPolicyFilter only knows if the plan has access, not credit balance.Verificacion/debito de creditos — Orquestador (#2) llama ICreditsGate. ToolPolicyFilter solo sabe si el plan tiene acceso, no el saldo de creditos.
ConfirmationFlow — When requiresConfirmation: true, the orchestrator (#2) handles pause/resume. ToolExecutor returns PolicyDecision only.ConfirmationFlow — Cuando requiresConfirmation: true, el orquestador (#2) maneja pausa/reanudacion. ToolExecutor solo retorna PolicyDecision.
Automatic contexts ≠ tools — KB and Brand Health are user prompt inputs via RAG (Context Aggregator #5). UserProfile and critical alerts are system prompt L2 inputs (Personality Engine #4). None of these are tools the LLM invokes.Contextos automaticos ≠ tools — KB y Brand Health son inputs del user prompt via RAG (Context Aggregator #5). UserProfile y alertas criticas son inputs L2 del system prompt (Personality Engine #4). Ninguno de estos son tools que el LLM invoca.

Tech Stack (TypeScript — AWS Lambda)Stack Tecnologico (TypeScript — AWS Lambda)

TypeScript 5+ AWS Lambda JSON Schema Anthropic tool_use DynamoDB Promise.all (parallel)
Data Models, API Signatures & Acceptance Criteria Modelos de Datos, APIs & Criterios de Aceptación
Data ModelsModelos de Datos
interface ToolDefinition {
  name: string              // Name the LLM uses to invoke
  description: string       // description_for_llm — optimized for correct selection
  inputSchema: JSONSchema    // Parameters (JSON Schema format)
  riskLevel: ToolRiskLevel  // 'read_only' | 'reversible' | 'irreversible'
  marketplace: Marketplace[] // Compatible marketplaces. [] = all
  category: ToolCategory    // 'READ' | 'ANALYSIS' | 'SYSTEM' | 'WRITE'
  phase: Phase              // Implementation phase (1-5)
  estimatedTokens?: number  // Result cost estimate (for cost guard)
}

interface ToolContext {
  userId: string
  marketplace: Marketplace
  plan: 'free' | 'pro'
  conversationId: string
  executionId?: string      // From IOrchestrationTracer
}

interface ToolResult {
  toolName: string
  content: unknown          // Raw result
  truncated: boolean        // true if truncated by ContextWindowManager
  latencyMs: number
  tokensEstimate: number
}

type PolicyDecision =
  | { allowed: true;  requiresConfirmation: false }
  | { allowed: true;  requiresConfirmation: true; preview: ConfirmationPreview }
  | { allowed: false; reason: string }
Interfaces (Ports)Interfaces (Puertos)
// The ONLY port the orchestrator sees
interface IToolExecutor {
  getToolDefinitions(context: ToolContext): ToolDefinition[]
  execute(toolName: string, toolArgs: Record<string, unknown>,
          context: ToolContext): Promise<ToolResult>
}

interface IToolRegistry {
  register(def: ToolDefinition, handler: IToolHandler): void
  registerRemote(def: ToolDefinition, dispatcher: IRemoteDispatcher): void
  getDefinitions(context: ToolContext): ToolDefinition[]
  getHandler(toolName: string): IToolHandler | IRemoteDispatcher
}

interface IToolPolicyFilter {
  check(toolName: string, context: ToolContext): PolicyDecision
  // Gates: 1. marketplace gate  2. risk gate
}

interface IToolHook {
  beforeTool?(toolName: string, args: unknown, ctx: ToolContext): Promise<void>
  afterTool?(toolName: string, result: ToolResult, ctx: ToolContext): Promise<void>
}
// Hooks: ObservabilityHook (after), ProactiveSuggestionHook (after),
//        FeedbackCaptureHook (before+after, WRITE only → #15)
// Hooks never block execution — fire-and-forget
Acceptance CriteriaCriterios de Aceptación
  • ToolRegistry loads all tool definitions at Lambda startup with inputSchema validation
  • IToolExecutor.getToolDefinitions() returns only tools available for user's marketplace and plan
  • LLM selects correct tool >90% of the time via description_for_llm
  • ToolPolicyFilter blocks all WRITE tools without confirmation — no bypass path
  • Multiple read_only tools in same round execute via Promise.all (parallel)
  • ObservabilityHook fires after every tool execution (fire-and-forget)
  • Tool handler timeout of 10s prevents blocking the ReAct loop
  • [Ph 1] SessionResultCache: same tool+args within a session returns cached result for READ/ANALYSIS tools. WRITE tools never cached. Cache scoped to session lifetime
  • [Ph 2] FeedbackCaptureHook: beforeTool captures product metrics snapshot via Data Sync (#10) for WRITE tools; afterTool logs executed action to Feedback Loop (#15). Fire-and-forget — never blocks execution
  • ToolRegistry carga todas las definiciones al arrancar Lambda con validacion de inputSchema
  • IToolExecutor.getToolDefinitions() retorna solo tools disponibles para el marketplace y plan del usuario
  • LLM selecciona la tool correcta >90% del tiempo via description_for_llm
  • ToolPolicyFilter bloquea todas las tools WRITE sin confirmacion — sin bypass
  • Multiples tools read_only en el mismo round se ejecutan via Promise.all (paralelo)
  • ObservabilityHook se dispara tras cada ejecucion de tool (fire-and-forget)
  • Timeout de handler de 10s previene bloquear el loop ReAct
  • [Ph 1] SessionResultCache: misma tool+args dentro de una sesion retorna resultado cacheado para tools READ/ANALYSIS. Tools WRITE nunca se cachean. Cache con alcance de vida de la sesion
  • [Ph 2] FeedbackCaptureHook: beforeTool captura snapshot de metricas del producto via Data Sync (#10) para tools WRITE; afterTool registra accion ejecutada en Feedback Loop (#15). Fire-and-forget — nunca bloquea ejecucion

36 tools · 4 categories (READ/ANALYSIS/SYSTEM/WRITE) · 5 phases · TypeScript · Internal use only (no REST)

How It WorksComo Funciona

ReActOrchestrator receives tool_use block from LLM
        |
        v
IToolExecutor.execute(toolName, toolArgs, context)
        |
        v
+-------------------------------------------------------+
|  ToolPolicyFilter.check(toolName, context)            |
|  ├── marketplace gate: compatible?                    |
|  └── risk gate: read_only or requires confirmation?   |
+-------------------------------------------------------+
        |
        |── denied ──────────────> error to orchestrator
        |
        |── requiresConfirmation ─> PolicyDecision to orchestrator
        |                           (ConfirmationFlow takes control)
        |
        |── allowed, read_only ─────────────────────────
        |                                               |
        v                                               |
+-----------------------------+                         |
|  IToolHook.beforeTool()     |                         |
+-----------------------------+                         |
        |                                               |
        v                                               |
+-----------------------------+                         |
|  IToolHandler.execute()     |  multiple read_only     |
|  (tool handler)             |  → Promise.all          |
+-----------------------------+                         |
        |                                               |
        v                                               |
+-----------------------------+                         |
|  IToolHook.afterTool()      |  ← ObservabilityHook   |
|                             |  ← ProactiveSuggHook   |
+-----------------------------+                         |
        |                                               |
        └───────────────────────────────────────────────
        |
        v
ToolResult → append as tool_result → next round
            

The orchestrator calls IToolExecutor.execute() — it never touches ToolRegistry, ToolPolicyFilter, or handlers directly. ToolPolicyFilter checks marketplace compatibility and risk level. Denied tools return an error. Tools requiring confirmation return a PolicyDecision and the ConfirmationFlow (orchestrator #2) takes control. Allowed read_only tools pass through the HookLifecycle: beforeTool → handler execution → afterTool. When the LLM requests multiple read_only tools in the same round, SubtaskRunner executes them in parallel via Promise.all. Write tools are always sequential with mandatory confirmation. ObservabilityHook logs every execution to IOrchestrationTracer (fire-and-forget). ProactiveSuggestionHook evaluates Next Best Action after tool results.El orquestador llama a IToolExecutor.execute() — nunca toca ToolRegistry, ToolPolicyFilter ni handlers directamente. ToolPolicyFilter verifica compatibilidad de marketplace y nivel de riesgo. Tools denegadas retornan error. Tools que requieren confirmacion retornan PolicyDecision y el ConfirmationFlow (orquestador #2) toma control. Tools read_only permitidas pasan por el HookLifecycle: beforeTool → ejecucion del handler → afterTool. Cuando el LLM solicita multiples tools read_only en el mismo round, SubtaskRunner las ejecuta en paralelo via Promise.all. Tools de escritura son siempre secuenciales con confirmacion obligatoria. ObservabilityHook registra cada ejecucion en IOrchestrationTracer (fire-and-forget). ProactiveSuggestionHook evalua Next Best Action despues de resultados de tools.

Implementation Plan (5 Phases)Plan de Implementacion (5 Fases)

Phase 1: Infrastructure (prerequisite for all tools)Fase 1: Infraestructura (prerequisito para todas las tools)

ToolDefinition, ToolRiskLevel, ToolCategory domain types. IToolExecutor interface. IToolRegistry + ToolRegistry (register + lookup). IToolPolicyFilter + ToolPolicyFilter (marketplace + risk gates). IToolHook + ObservabilityHook. ToolExecutor implementation with HookLifecycle and SubtaskRunner (Promise.all for read_only). SessionResultCache: per-session deduplication for READ/ANALYSIS tools — same tool+args returns cached result, WRITE tools never cached, cache scoped to session lifetime. Integrate IToolExecutor into ReActOrchestrator. New ILLMClient method: generate(messages, tools) → ContentBlock[] — shared deliverable with #2, consumed by ReActOrchestrator. Update AnthropicClient to send tools[] and parse tool_use blocks.ToolDefinition, ToolRiskLevel, ToolCategory domain types. IToolExecutor interface. IToolRegistry + ToolRegistry (registro + lookup). IToolPolicyFilter + ToolPolicyFilter (gates marketplace + riesgo). IToolHook + ObservabilityHook. Implementacion ToolExecutor con HookLifecycle y SubtaskRunner (Promise.all para read_only). SessionResultCache: deduplicacion por sesion para tools READ/ANALYSIS — misma tool+args retorna resultado cacheado, tools WRITE nunca se cachean, cache con alcance de vida de sesion. Integrar IToolExecutor en ReActOrchestrator. Nuevo metodo ILLMClient: generate(messages, tools) → ContentBlock[] — entregable compartido con #2, consumido por ReActOrchestrator. Actualizar AnthropicClient para enviar tools[] y parsear bloques tool_use.

Phase 2: 10 READ tools + 1 SYSTEMFase 2: 10 tools READ + 1 SYSTEM

get_product, get_product_metrics, get_orders, get_buyer_questions, get_product_reviews, get_category_requirements, get_campaigns, get_campaign_metrics, get_store, get_store_metrics — definitions + handlers connected to marketplace adapters (#12). update_user_profile (SYSTEM) — DynamoDB handler. ProactiveSuggestionHook with LLM evaluation via IProactiveSuggestionService (#6) (max 2 suggestions per turn).get_product, get_product_metrics, get_orders, get_buyer_questions, get_product_reviews, get_category_requirements, get_campaigns, get_campaign_metrics, get_store, get_store_metrics — definiciones + handlers conectados a adaptadores de marketplace (#12). update_user_profile (SYSTEM) — handler DynamoDB. ProactiveSuggestionHook con evaluacion LLM via IProactiveSuggestionService (#6) (max 2 sugerencias por turno).

Phase 3: 8 ANALYSIS tools + first WRITEFase 3: 8 tools ANALYSIS + primera WRITE

search_market_products, get_competitor_product, get_market_pricing, get_keyword_data, analyze_product_image, enhance_product_image, analyze_product_video, get_product_fee_estimate. 7 of 8 ANALYSIS tools routed via Enrichment Layer (#11). update_product_content — first reversible tool with real ConfirmationFlow: agent proposes title change, loop pauses, seller accepts/rejects, loop resumes.search_market_products, get_competitor_product, get_market_pricing, get_keyword_data, analyze_product_image, enhance_product_image, analyze_product_video, get_product_fee_estimate. 7 de 8 ANALYSIS tools ruteadas via Enrichment Layer (#11). update_product_content — primera tool reversible con ConfirmationFlow real: agente propone cambio de titulo, loop se pausa, vendedor acepta/rechaza, loop continua.

Phase 4: 12 WRITE tools + streamingFase 4: 12 tools WRITE + streaming

publish_product, update_product_images, update_product_video, update_price, update_stock, pause_product, activate_product, close_product (irreversible), answer_question (irreversible), hide_question, send_buyer_message (irreversible), request_review. ProactiveSuggestionHook Phase 4: parallel LLM inference, deduplication via UserProfile.publish_product, update_product_images, update_product_video, update_price, update_stock, pause_product, activate_product, close_product (irreversible), answer_question (irreversible), hide_question, send_buyer_message (irreversible), request_review. ProactiveSuggestionHook Fase 4: inferencia LLM paralela, deduplicacion via UserProfile.

Phase 5: 4 Advertising toolsFase 5: 4 tools de Advertising

create_campaign, update_campaign, pause_campaign, activate_campaign. Advertising has its own lifecycle and complexity — implemented in a separate phase.create_campaign, update_campaign, pause_campaign, activate_campaign. Advertising tiene su propio ciclo de vida y complejidad — se implementa en fase separada.

Risk AnalysisAnalisis de Riesgos

LLM Selects Wrong ToolLLM Selecciona Tool Incorrecta

Impact: MediumImpacto: Medio

Mitigation: description_for_llm with trigger phrases, explicit exclusions ("does not include..."), and differentiation from similar tools. Evaluation suite of test prompts before production for each new tool.Mitigacion: description_for_llm con frases de detonacion, exclusiones explicitas ("no incluye..."), y diferenciacion de tools similares. Suite de evaluacion de prompts de prueba antes de produccion para cada nueva tool.

WRITE Tool Executed Without ConfirmationTool WRITE Ejecutada Sin Confirmacion

Impact: HighImpacto: Alto

Mitigation: ToolPolicyFilter forces requiresConfirmation: true for all riskLevel != read_only. No bypass path — handler is never invoked until orchestrator receives explicit user confirmation. Architectural constraint.Mitigacion: ToolPolicyFilter fuerza requiresConfirmation: true para todo riskLevel != read_only. Sin bypass — handler nunca se invoca hasta que el orquestador recibe confirmacion explicita del usuario. Restriccion arquitectonica.

External Tool Blocks ReAct LoopTool Externa Bloquea Loop ReAct

Impact: MediumImpacto: Medio

Mitigation: Each handler has 10s internal timeout. Read_only tools execute in parallel (Promise.all). In Phase 4, WebSocket streaming eliminates Lambda timeout as UX bottleneck.Mitigacion: Cada handler tiene timeout interno de 10s. Tools read_only se ejecutan en paralelo (Promise.all). En Fase 4, streaming WebSocket elimina timeout de Lambda como cuello de botella de UX.

Tool Proliferation Degrades SelectionProliferacion de Tools Degrada Seleccion

Impact: MediumImpacto: Medio

Mitigation: ToolPolicyFilter reduces visible catalog per user/phase — LLM only sees tools available for their plan and marketplace. Avoid redundant tools: unify with optional parameters instead of separate entries.Mitigacion: ToolPolicyFilter reduce catalogo visible por usuario/fase — LLM solo ve tools disponibles para su plan y marketplace. Evitar tools redundantes: unificar con parametros opcionales en vez de entradas separadas.

Key DecisionsDecisiones Clave

D1.

Primitive tools, not composed skills — 36 primitive tools, each doing one concrete thing. The LLM composes them at runtime in the ReAct loop. No separate "Skills" layer (YAML business capabilities) on top — the composition is the LLM's job.Tools primitivas, no skills compuestas — 36 tools primitivas, cada una haciendo una cosa concreta. El LLM las compone en tiempo de ejecucion en el loop ReAct. Sin capa de "Skills" separada (capacidades de negocio YAML) encima — la composicion es trabajo del LLM.

D2.

IToolExecutor as the only port for the orchestrator — ReActOrchestrator never calls ToolRegistry or ToolPolicyFilter directly. Policy check, hook lifecycle, and handler dispatch are encapsulated in ToolExecutor. Adding a new tool never touches the orchestrator.IToolExecutor como unico puerto para el orquestador — ReActOrchestrator nunca llama a ToolRegistry ni ToolPolicyFilter directamente. Policy check, hook lifecycle y dispatch al handler estan encapsulados en ToolExecutor. Agregar una nueva tool nunca toca el orquestador.

D3.

HookLifecycle as cross-cutting extension point — Observability and post-tool logic (Next Best Action) live in hooks, not in handlers or the orchestrator. Adding logging or proactive suggestions is a hook addition, not a handler change.HookLifecycle como punto de extension transversal — Observabilidad y logica post-tool (Next Best Action) viven en hooks, no en handlers ni en el orquestador. Agregar logging o sugerencias proactivas es una adicion de hook, no un cambio en handlers.

D4.

READ/ANALYSIS always local; WRITE candidates for separate Lambda — In Phases 1-3, all handlers run in this Lambda. In Phase 4+, WRITE handlers may move to a separate project if read/write SLAs diverge. IRemoteDispatcher in ToolRegistry prepares this transition.READ/ANALYSIS siempre locales; WRITE candidatas a Lambda separada — En Fases 1-3, todos los handlers corren en este Lambda. En Fase 4+, handlers WRITE pueden moverse a un proyecto separado si los SLAs de lectura/escritura divergen. IRemoteDispatcher en ToolRegistry prepara esta transicion.

D5.

Automatic contexts are NOT tools — KB and Brand Health are user prompt inputs via RAG (#5), UserProfile and critical alerts are system prompt L2 inputs (#4). None are tool results. Keeping them separate means the LLM doesn't decide "when to query the KB" — KB is always available. Simplifies the visible tool catalog.Contextos automaticos NO son tools — KB y Brand Health son inputs del user prompt via RAG (#5), UserProfile y alertas criticas son inputs L2 del system prompt (#4). Ninguno son resultados de tools. Mantenerlos separados significa que el LLM no decide "cuando consultar la KB" — la KB siempre esta disponible. Simplifica el catalogo visible de tools.

MVP Scope

Phase 1 infrastructure + Phase 2 (10 READ + 1 SYSTEM). 36 tools total across 5 phases. Internal use only — no REST endpoints exposed. ToolPolicyFilter gates by marketplace and plan. Infraestructura Fase 1 + Fase 2 (10 READ + 1 SYSTEM). 36 tools total en 5 fases. Uso interno solamente — sin endpoints REST expuestos. ToolPolicyFilter filtra por marketplace y plan.

Inspired byInspirado en

Claude Code primitive tools pattern — small, composable, open-ended. Anthropic tool_use API. Patron de tools primitivas de Claude Code — pequenas, componibles, posibilidades abiertas. API tool_use de Anthropic.

Source:Fuente: New — no tool infrastructure existsNuevo — no existe infraestructura de tools | Depends on:Depende de: #12 (marketplace adapters for handlers), #8 (IOrchestrationTracer for ObservabilityHook), #10 (Data Sync — FeedbackCaptureHook reads pre-action metrics), #15 (Feedback Loop — FeedbackCaptureHook writes action log), #6 (IProactiveSuggestionService — Phase 2 LLM evaluation via after_tool hook)
📝 Project ChangelogChangelog del Proyecto
v4 Mar 2, 2026
+36 primitive tools explicitly defined across 5 categories (WRITE, READ, ANALYSIS, SYSTEM, META)36 tools primitivos definidos explicitamente en 5 categorias (WRITE, READ, ANALYSIS, SYSTEM, META)
+ToolPolicyFilter: enforcement of rules by plan + risk levelToolPolicyFilter: enforcement de reglas por plan + nivel de riesgo
+HookLifecycle: 3 hooks (ObservabilityHook, ProactiveSuggestionHook, FeedbackCaptureHook)HookLifecycle: 3 hooks (ObservabilityHook, ProactiveSuggestionHook, FeedbackCaptureHook)
+SessionResultCache: same tool+args within session returns cached result (READ/ANALYSIS only)SessionResultCache: mismo tool+args dentro de sesion retorna resultado cacheado (solo READ/ANALYSIS)
+FeedbackCaptureHook: snapshot before + log after for WRITE tools → #15FeedbackCaptureHook: snapshot antes + log despues para tools WRITE → #15
~Renamed “Skills Engine” → “Tool Registry & Policy Engine” throughout documentRenombrado “Skills Engine” → “Tool Registry & Policy Engine” en todo el documento
~listing tools renamed → product (get_product_detail, audit_product, etc.)Tools de listing renombrados → product (get_product_detail, audit_product, etc.)
~7 of 8 ANALYSIS tools delegated to #11 Enrichment Layer7 de 8 ANALYSIS tools delegados a #11 Enrichment Layer
v3 Feb 27-28, 2026
+Full deep spec card as “Skills Engine” — ~25-26 tools, tools as Anthropic tool_useCard deep spec completa como “Skills Engine” — ~25-26 tools, tools como Anthropic tool_use
v2.1 Feb 27, 2026
+Plan gating: Free = 5 read tools, Pro = 8 read+write toolsGating por plan: Free = 5 tools lectura, Pro = 8 tools lectura+escritura
v2 Feb 27, 2026
~Architecture: SkillRegistry + SkillResolver + SkillExecutor (3 layers) → tools as Anthropic tool_use + middlewareArquitectura: SkillRegistry + SkillResolver + SkillExecutor (3 capas) → tools como Anthropic tool_use + middleware
+search_competitors decision: MeLi Search API as data sourceDecision search_competitors: MeLi Search API como fuente de datos
v1 Feb 26, 2026
+Initial — “Tool Registry & Policy Engine” (v1 #6), NEW statusInicial — “Tool Registry & Policy Engine” (v1 #6), estado NEW
#4

Personality Engine

Intelligence — Mateo

NEW

The Coach's identity in every invocation. ISystemPromptComposer assembles the system prompt from 3 layers: L1 — Base identity (~500 tokens, static, always cached) + L2 — Session context (UserProfile + critical Critique alerts, ~150-300 tokens, cacheable) + L3 — Execution mode (write confirmation guardrails, ~100-150 tokens, conditional on write_capable). Returns SystemPromptBlock[] — not a string — so AnthropicClient can apply cache_control per block. Marketplace terminology lives in the KB semantic search, NOT in the system prompt. Today: COACH_SYSTEM_PROMPT is a static string literal (~500 tokens). The compositor doesn't exist yet — it's introduced when UserProfile needs dynamic injection (Phase 0.2). La identidad del Coach en cada invocacion. ISystemPromptComposer ensambla el system prompt desde 3 capas: L1 — Identidad base (~500 tokens, estatica, siempre cacheada) + L2 — Contexto de sesion (UserProfile + alertas criticas Critique, ~150-300 tokens, cacheable) + L3 — Modo de ejecucion (guardrails de confirmacion para escritura, ~100-150 tokens, condicional a write_capable). Retorna SystemPromptBlock[] — no un string — para que AnthropicClient pueda aplicar cache_control por bloque. La terminologia de marketplace vive en la busqueda semantica de la KB, NO en el system prompt. Hoy: COACH_SYSTEM_PROMPT es un string literal estatico (~500 tokens). El compositor aun no existe — se introduce cuando UserProfile necesita inyeccion dinamica (Fase 0.2).

Beautonomous governance: L3 execution mode block embeds Core's role-based governance rules at the LLM reasoning level — when write_capable=true, the system prompt includes the confirmation guardrails that enforce Core's ConfirmationFlow behavior before the LLM can propose any WRITE action.Governance de Beautonomous: el bloque L3 de modo de ejecución embebe las reglas de governance basadas en roles de Core a nivel de razonamiento LLM — cuando write_capable=true, el system prompt incluye los guardrails de confirmación que aplican el comportamiento del ConfirmationFlow de Core antes de que el LLM pueda proponer cualquier acción WRITE.

Current StateEstado Actual

Existing (L1 content)Existente (contenido L1)

COACH_SYSTEM_PROMPT in CoachPrompts.ts — static string literal (~500 tokens). Covers: identity/role, general rules (neutral Spanish, direct tone, no fabricated data), KB context usage, Brand Health vocabulary (Critique/Delicate/Good/Optimal), prompt injection guard, response format. FlashCoachPrompts.ts and FlashAnalysisPrompts.ts are separate flows with their own prompts — not managed by this system.COACH_SYSTEM_PROMPT en CoachPrompts.ts — string literal estatico (~500 tokens). Cubre: identidad/rol, reglas generales (espanol neutro, tono directo, no fabricar datos), uso de contexto KB, vocabulario Brand Health (Critique/Delicate/Good/Optimal), guard de prompt injection, formato de respuesta. FlashCoachPrompts.ts y FlashAnalysisPrompts.ts son flujos separados con sus propios prompts — no gestionados por este sistema.

Needs RefactorNecesita Refactor

COACH_SYSTEM_PROMPT as string → BASE_COACH_BLOCK with cache_control (Phase 1.9). ChatOptions.systemPrompt?: string → accept string | SystemPromptBlock[] for backward compatibility (Phase 1.9).COACH_SYSTEM_PROMPT como string → BASE_COACH_BLOCK con cache_control (Fase 1.9). ChatOptions.systemPrompt?: string → aceptar string | SystemPromptBlock[] para compatibilidad (Fase 1.9).

To BuildPor Construir

ISystemPromptComposer interface + types (Phase 0.2). SystemPromptComposer with L1 + L2 UserProfile (Phase 0.2). L2 critical Critique alerts from getHealthSummary() (Phase 1.8). Prompt caching in AnthropicClient (Phase 1.9). L3 write confirmation guardrails (Phase 3).ISystemPromptComposer interface + tipos (Fase 0.2). SystemPromptComposer con L1 + L2 UserProfile (Fase 0.2). L2 alertas criticas Critique desde getHealthSummary() (Fase 1.8). Prompt caching en AnthropicClient (Fase 1.9). L3 guardrails de confirmacion para escritura (Fase 3).

L1 — Base Identity
~500 tok · Static · Always cached~500 tok · Estatica · Siempre cacheada
L2 — Session Context
~150-300 tok · UserProfile + Critique alerts~150-300 tok · UserProfile + alertas Critique
L3 — Execution Mode
~100-150 tok · Write confirmation guardrails~100-150 tok · Guardrails de confirmacion
ISystemPromptComposer
Domain portPuerto de dominio
SystemPromptBlock[]
Array for Anthropic APIArray para API Anthropic
cache_control
L1 always, L2 if stableL1 siempre, L2 si estable
~750-950 typical~750-950 tipico
1200 hard cap1200 limite duro

Scope Boundaries — What this layer does NOT doLimites de Alcance — Lo que esta capa NO hace

Marketplace knowledge — Terminology, policies, metrics of each marketplace live in the KB semantic search (#9). The RAG retrieves them when the query makes them relevant. The system prompt does NOT duplicate this — it would be redundant and expensive in tokens.Conocimiento de marketplace — Terminologia, politicas, metricas de cada marketplace viven en la busqueda semantica de la KB (#9). El RAG los recupera cuando la query los hace relevantes. El system prompt NO duplica esto — seria redundante y costoso en tokens.
Conversational context — Message history, KB chunks from RAG, tool call results — all go in the user prompt, not the system prompt. ContextWindowManager (#5) handles this. Exception: critical Critique alerts go in L2 because they're session-permanent urgencies, not query responses.Contexto conversacional — Historial de mensajes, chunks de KB del RAG, resultados de tool calls — todo va en el user prompt, no en el system prompt. ContextWindowManager (#5) maneja esto. Excepcion: alertas criticas Critique van en L2 porque son urgencias permanentes de sesion, no respuestas a queries.
Plan authorization — What capabilities a user has per subscription plan is NOT this layer's responsibility. ToolPolicyFilter (#3) resolves it via allowedTools. The composer only reflects capabilities already resolved externally.Autorizacion por plan — Que capacidades tiene un usuario segun su plan de suscripcion NO es responsabilidad de esta capa. ToolPolicyFilter (#3) lo resuelve via allowedTools. El compositor solo refleja capacidades ya resueltas externamente.
Flash Metrics prompts — FlashCoachPrompts.ts and FlashAnalysisPrompts.ts are distinct flows (real-time metric analysis). Each Flash use case passes its own system prompt directly to ILLMClient. Not part of this system.Prompts de Flash Metrics — FlashCoachPrompts.ts y FlashAnalysisPrompts.ts son flujos distintos (analisis de metricas en tiempo real). Cada use case Flash pasa su system prompt directamente a ILLMClient. No es parte de este sistema.

Tech Stack (TypeScript — AWS Lambda)Stack Tecnologico (TypeScript — AWS Lambda)

TypeScript 5+ Anthropic prompt caching SystemPromptBlock[] cache_control: ephemeral
Data Models, Interfaces & Acceptance Criteria Modelos de Datos, Interfaces & Criterios de Aceptación
Data ModelsModelos de Datos
interface SystemPromptContext {
  userProfile?: UserProfileSummary    // undefined until Phase 0.2
  criticalAlerts?: HealthAlert[]      // undefined until Phase 1.8
  writeCapable?: boolean              // false/undefined until Phase 3
}

interface UserProfileSummary {
  marketplaces: Marketplace[]
  categories: string[]
  declaredGoals: string[]
}

interface HealthAlert {
  domain: 'inventory' | 'advertising' | 'organic' | 'financial'
  level: 'Critique' | 'Delicate'
  product?: string
  metric: string
}

interface ComposedSystemPrompt {
  blocks: SystemPromptBlock[]
  estimatedTokens: number
  cachedBlockCount: number
}

interface SystemPromptBlock {
  text: string
  cache_control?: { type: 'ephemeral' }
}

// Token Budget (system prompt only):
// L1 Base: ~500 tokens (always)
// L2 Session: ~150-300 tokens (from Phase 0.2/1.8)
// L3 Execution: ~100-150 tokens (from Phase 3, write_capable only)
// Typical total: ~750-950 tokens | Hard cap: 1200 tokens
// Truncation priority: L3 first, then L2.alerts
Interfaces (Ports)Interfaces (Puertos)
// domain/common/services/ISystemPromptComposer.ts
interface ISystemPromptComposer {
  compose(context: SystemPromptContext): ComposedSystemPrompt
}

// Returns blocks[] (not string) so AnthropicClient can apply
// cache_control per block. Other clients (Vertex, OpenRouter)
// concatenate blocks as plain string — no functional change.

// ChatOptions extension (Phase 1.9, backward compatible):
interface ChatOptions {
  systemPrompt?: string | SystemPromptBlock[]
}

// Dependencies:
// SystemPromptComposer
// ├── IUserProfileRepository      (reads UserProfile, Phase 0.2)
// └── IBrandHealthContextService  (reads Critique alerts, Phase 1.8)
// The composer does NOT call the LLM. Does NOT know conversation
// history. Only composes text blocks from already-resolved data.
Acceptance CriteriaCriterios de Aceptación
  • System prompt assembled from blocks[] with per-block cache_control support
  • L1 (base identity) always present at position 0, always with cache_control: ephemeral
  • L2 injects UserProfile (marketplaces, categories, goals) when available
  • L2 injects critical Critique alerts when active — Delicate-only sessions omit alert block
  • L3 activates write confirmation guardrails only when writeCapable === true
  • Total system prompt stays under 1200 tokens hard cap (typical ~750-950)
  • AnthropicClient serializes blocks with cache_control; other clients concatenate to string
  • System prompt ensamblado desde blocks[] con soporte de cache_control por bloque
  • L1 (identidad base) siempre presente en posicion 0, siempre con cache_control: ephemeral
  • L2 inyecta UserProfile (marketplaces, categorias, objetivos) cuando disponible
  • L2 inyecta alertas criticas Critique cuando activas — sesiones solo-Delicate omiten bloque de alertas
  • L3 activa guardrails de confirmacion de escritura solo cuando writeCapable === true
  • System prompt total se mantiene bajo 1200 tokens limite duro (tipico ~750-950)
  • AnthropicClient serializa bloques con cache_control; otros clientes concatenan a string

3 layers · L1 ~500t cached · L2 ~150-300t · L3 ~100-150t · 1200 hard cap · blocks[] for Anthropic prompt caching

How It Works — 3-Layer CompositionComo Funciona — Composicion de 3 Capas

AgentLoopOrchestrator
        |
        v
SystemPromptComposer.compose(context)
        |
        +-- [0] L1 BaseCoachBlock     (always, cache_control: ephemeral)
        |       "You are Coach, expert in digital commerce..."
        |       ~500 tokens — current COACH_SYSTEM_PROMPT content
        |
        +-- [1] L2 SessionBlock       (if userProfile or criticalAlerts)
        |       "SELLER PROFILE:
        |         Active marketplaces: MercadoLibre Argentina
        |         Categories: Electronics, Accessories
        |         Declared goals: scale to 500 sales/mo
        |       CRITICAL ALERTS:
        |         Critique — Inventory: 'BT Pro Headphones' out of stock 3d"
        |       cache_control: ephemeral if UserProfile unchanged
        |
        +-- [2] L3 ExecutionBlock     (if writeCapable === true)
                "EXECUTION CAPABILITIES:
                 You can propose and execute marketplace changes.
                 CONFIRMATION RULES (non-negotiable):
                 - Never execute without explicit seller confirmation
                 - Show current ('before') and proposed ('after')
                 - Irreversible actions require full action text"
                No cache_control — varies per session
        |
        v
ComposedSystemPrompt { blocks[], estimatedTokens, cachedBlockCount }
        |
        v
AnthropicClient → system: [
  { type: "text", text: "...", cache_control: { type: "ephemeral" } },  // L1
  { type: "text", text: "...", cache_control: { type: "ephemeral" } },  // L2
  { type: "text", text: "..." }                                         // L3
]
            

The AgentLoopOrchestrator calls SystemPromptComposer.compose() before each LLM invocation. The composer returns SystemPromptBlock[] — an array of typed objects, not a concatenated string. This is because the Anthropic API accepts the system field as an array where each block can carry cache_control individually. L1 (position 0) is always cached — it's the largest block (~500 tokens) and never varies between users. L2 is cacheable when UserProfile hasn't changed (rare — only changes when update_user_profile executes). L3 is never cached — it depends on whether write tools are available for this session. Other LLM clients (Vertex, OpenRouter) don't support prompt caching — they concatenate blocks to a plain string internally. Marketplace terminology (MeLi "publicacion" vs Amazon "listing") lives in the KB semantic search, not in the system prompt — the RAG brings it when relevant.El AgentLoopOrchestrator llama a SystemPromptComposer.compose() antes de cada invocacion LLM. El compositor retorna SystemPromptBlock[] — un array de objetos tipados, no un string concatenado. Esto es porque la API de Anthropic acepta el campo system como array donde cada bloque puede llevar cache_control individualmente. L1 (posicion 0) siempre se cachea — es el bloque mas grande (~500 tokens) y nunca varia entre usuarios. L2 es cacheable cuando UserProfile no ha cambiado (raro — solo cambia cuando update_user_profile se ejecuta). L3 nunca se cachea — depende de si hay tools write disponibles para esta sesion. Otros clientes LLM (Vertex, OpenRouter) no soportan prompt caching — concatenan bloques a string plano internamente. La terminologia de marketplace (MeLi "publicacion" vs Amazon "listing") vive en la busqueda semantica de la KB, no en el system prompt — el RAG la trae cuando es relevante.

Implementation Plan (phased with orchestrator)Plan de Implementacion (en fases con orquestador)

Phase 0.2: Introduce Composer + L2 UserProfileFase 0.2: Introducir Compositor + L2 UserProfile

Create ISystemPromptComposer interface + SystemPromptContext/ComposedSystemPrompt/SystemPromptBlock types in domain. Implement SystemPromptComposer with L1 (refactored COACH_SYSTEM_PROMPT as BASE_COACH_BLOCK) + L2 UserProfile sub-block (marketplaces, categories, goals). No L3 yet — tools don't exist. Wire into orchestrator.Crear ISystemPromptComposer interface + tipos SystemPromptContext/ComposedSystemPrompt/SystemPromptBlock en dominio. Implementar SystemPromptComposer con L1 (COACH_SYSTEM_PROMPT refactorizado como BASE_COACH_BLOCK) + sub-bloque L2 UserProfile (marketplaces, categorias, objetivos). Sin L3 aun — las tools no existen. Conectar al orquestador.

Phase 1.8: L2 Critical AlertsFase 1.8: L2 Alertas Criticas

Add criticalAlerts?: HealthAlert[] to SystemPromptContext. L2 includes Critique alerts sub-block when active. Delicate-only states are omitted — the Coach finds them via RAG. The alert block exists only for urgencies that need visibility regardless of the user's query.Agregar criticalAlerts?: HealthAlert[] al SystemPromptContext. L2 incluye sub-bloque de alertas Critique cuando estan activas. Estados solo-Delicate se omiten — el Coach los encuentra via RAG. El bloque de alertas existe solo para urgencias que necesitan visibilidad independientemente del query del usuario.

Phase 1.9: Prompt CachingFase 1.9: Prompt Caching

Extend ChatOptions.systemPrompt to accept string | SystemPromptBlock[] (backward compatible). AnthropicClient serializes blocks with cache_control — L1 always cached, L2 cached if UserProfile unchanged. VertexLLMClient and OpenRouterLLMClient concatenate blocks to string (no functional change, just no caching). L1 caching saves ~450 tokens cost per turn (~10% of normal price).Extender ChatOptions.systemPrompt para aceptar string | SystemPromptBlock[] (compatible hacia atras). AnthropicClient serializa bloques con cache_control — L1 siempre cacheado, L2 cacheado si UserProfile no cambio. VertexLLMClient y OpenRouterLLMClient concatenan bloques a string (sin cambio funcional, solo sin caching). Caching de L1 ahorra ~450 tokens de costo por turno (~10% del precio normal).

Phase 3: L3 Execution ModeFase 3: L3 Modo de Ejecucion

Add writeCapable flag to SystemPromptContext. AgentLoopOrchestrator determines it from ToolPolicyFilter results — if all tools are read_only, writeCapable is not set and L3 is omitted. L3 adds non-negotiable confirmation rules: never execute without explicit confirmation, show before/after, irreversible actions require full action text in confirmation.Agregar flag writeCapable al SystemPromptContext. AgentLoopOrchestrator lo determina de los resultados del ToolPolicyFilter — si todas las tools son read_only, writeCapable no se establece y L3 se omite. L3 agrega reglas de confirmacion no negociables: nunca ejecutar sin confirmacion explicita, mostrar antes/despues, acciones irreversibles requieren texto completo de la accion en confirmacion.

Risk AnalysisAnalisis de Riesgos

L3 Not Activated When Write Tools AvailableL3 No Activada Cuando Hay Tools Write

Impact: High — LLM may execute without asking for confirmation.Impacto: Alto — LLM puede ejecutar sin pedir confirmacion.

Mitigation: writeCapable is determined by AgentLoopOrchestrator from ToolPolicyFilter results — not by inference. Responsibility is well-separated: composer adds L3 if and only if it receives writeCapable: true explicitly.Mitigacion: writeCapable es determinado por AgentLoopOrchestrator de los resultados del ToolPolicyFilter — no por inferencia. La responsabilidad esta bien separada: el compositor agrega L3 si y solo si recibe writeCapable: true explicitamente.

Stale Critique Alerts in L2Alertas Critique Desactualizadas en L2

Impact: Medium — Coach prioritizes a resolved issue unnecessarily.Impacto: Medio — Coach prioriza innecesariamente un problema ya resuelto.

Mitigation: L2 regenerates when UserProfile changes. Future: BrandHealthContextService.getHealthSummary() with short TTL per session. For now, alerts are queried at session start — if drift occurs during session, RAG corrects context in the next turn.Mitigacion: L2 se regenera cuando UserProfile cambia. Futuro: BrandHealthContextService.getHealthSummary() con TTL corto por sesion. Por ahora, alertas se consultan al inicio de sesion — si hay drift durante la sesion, el RAG corrige el contexto en el siguiente turno.

System Prompt Exceeds BudgetSystem Prompt Excede Presupuesto

Impact: Low — L2 and L3 are conditional. Max case ~950 tokens leaves margin.Impacto: Bajo — L2 y L3 son condicionales. Caso maximo ~950 tokens deja margen.

Mitigation: Hard cap of 1200 tokens in the composer. If exceeded (rare), truncation priority: L3 first, then L2.alerts. L1 is never truncated.Mitigacion: Limite duro de 1200 tokens en el compositor. Si se excede (raro), prioridad de truncamiento: L3 primero, luego L2.alerts. L1 nunca se trunca.

Key DecisionsDecisiones Clave

D1.

Marketplace terminology in the KB, not in the system prompt — The KB semantic search has the terminology dictionary. RAG retrieves it when the query makes it relevant. Duplicating it in the system prompt would consume fixed tokens on every invocation for information the LLM can obtain contextually.Terminologia de marketplace en la KB, no en el system prompt — La busqueda semantica de la KB tiene el diccionario de terminologia. El RAG lo recupera cuando la query lo hace relevante. Duplicarlo en el system prompt consumiria tokens fijos en cada invocacion para informacion que el LLM puede obtener contextualmente.

D2.

ISystemPromptComposer in domain, implementation in application — The orchestrator depends on the interface. Changing the layers, their order, or caching logic doesn't touch the orchestrator.ISystemPromptComposer en dominio, implementacion en aplicacion — El orquestador depende de la interfaz. Cambiar las capas, su orden o la logica de caching no toca el orquestador.

D3.

blocks[] as return type, not string — AnthropicClient needs separate blocks to apply cache_control individually. A plain string would lose caching granularity. Clients that don't support caching concatenate blocks internally.blocks[] como tipo de retorno, no string — AnthropicClient necesita bloques separados para aplicar cache_control individualmente. Un string plano perderia la granularidad de caching. Clientes que no soportan caching concatenan bloques internamente.

D4.

L3 is capability declaration, not tone change — The Coach is always direct and technical. What changes when write tools are available are the capabilities and their guardrails. L3 declares that — it doesn't adjust the tone.L3 es declaracion de capacidades, no cambio de tono — El Coach siempre es directo y técnico. Lo que cambia cuando hay tools write disponibles son las capacidades y sus guardrails. L3 declara eso — no ajusta el tono.

MVP Scope

Phase 0.2: ISystemPromptComposer + L1 (refactored COACH_SYSTEM_PROMPT) + L2 UserProfile. Phase 1.8: L2 Critique alerts. Phase 1.9: Prompt caching (blocks[] + cache_control). Phase 3: L3 write guardrails. System prompt ~750-950 tokens typical, 1200 hard cap. Fase 0.2: ISystemPromptComposer + L1 (COACH_SYSTEM_PROMPT refactorizado) + L2 UserProfile. Fase 1.8: L2 alertas Critique. Fase 1.9: Prompt caching (blocks[] + cache_control). Fase 3: L3 guardrails de escritura. System prompt ~750-950 tokens tipico, 1200 limite duro.

Inspired byInspirado en

Anthropic prompt caching API, Claude system prompt patterns. Existing COACH_SYSTEM_PROMPT as L1 base content. API de prompt caching de Anthropic, patrones de system prompt de Claude. COACH_SYSTEM_PROMPT existente como contenido base de L1.

Source:Fuente: Partial — COACH_SYSTEM_PROMPT exists as L1 contentParcial — COACH_SYSTEM_PROMPT existe como contenido L1 | Depends on:Depende de: None (consumed by #2 Orchestrator)Ninguna (consumido por #2 Orquestador)
📝 Project ChangelogChangelog del Proyecto
v4 Mar 2, 2026
~ISystemPromptComposer rewritten: 4-layer Python → 3-layer TypeScript (L1 BaseCoachBlock, L2 SessionBlock, L3 ExecutionBlock)ISystemPromptComposer reescrito: 4-layer Python → 3-layer TypeScript (L1 BaseCoachBlock, L2 SessionBlock, L3 ExecutionBlock)
~Token budget: ~1500/<2000 → ~750-950 typical, 1200 hard capPresupuesto de tokens: ~1500/<2000 → ~750-950 tipico, 1200 hard cap
+Returns blocks[] with cache_control for Anthropic prompt cachingRetorna blocks[] con cache_control para Anthropic prompt caching
Marketplace terminology moved to KB (#9), removed from system promptTerminologia de marketplace movida a KB (#9), removida del system prompt
~Owner: Mateo → PabloOwner: Mateo → Pablo
~listing → product rename in all terminology referencesRenombrado listing → product en todas las referencias de terminologia
v3 Feb 27-28, 2026
+Full deep spec card: 4-layer composition model, Python, ~1500 token target / <2000 hard capCard deep spec completa: modelo de composicion 4-layer, Python, ~1500 tokens objetivo / <2000 hard cap
v2.1 Feb 27, 2026
+Plan-aware system prompt: user plan injected for upgrade/pack promptsSystem prompt plan-aware: plan del usuario inyectado para prompts de upgrade/pack
v2 Feb 27, 2026
+Created as new project (split from v1 #9 Conversation Memory — inherits system prompt composition)Creado como proyecto nuevo (split de v1 #9 Conversation Memory — hereda composicion de system prompt)
+Token budget: 2,500-3,000 tokens with Anthropic prompt cachingPresupuesto de tokens: 2,500-3,000 tokens con Anthropic prompt caching
#5

Context Aggregator

Intelligence — Mateo

ADAPT

The Context Aggregator assembles RAG context from 2 automatic sources before every LLM call: KBContextProvider (top-K semantic chunks from BigQuery kb_embeddings via Cerebro #9) and BrandHealthContextService (intent-aware chunks from brand_health_embeddings via Data Sync #10). A single Vertex AI embedding call is shared across both searches, which run in parallel via Promise.allSettled. The IContextAssembler interface formalizes the existing RagOrchestrator pattern. IContextWindowManager manages a dynamic token budget over the 200K context window — calculating available space after system prompt, history, tool definitions, and expected response. HyDE (Hypothetical Document Embedding): when the user's query is short or ambiguous, the system can first generate a hypothetical answer internally, then use that answer as the search vector instead of the question itself — this finds better results because the KB contains answers and explanations, not questions, so a vector from a hypothetical answer is semantically closer to the stored documents. The hypothetical is never shown to the user — it's only used to improve search. Activated selectively, not on every query. Live seller data (metrics, inventory, products) is NOT pre-fetched — the LLM requests it on-demand via READ tools when needed. El Context Aggregator ensambla contexto RAG de 2 fuentes automaticas antes de cada llamada LLM: KBContextProvider (top-K chunks semanticos desde BigQuery kb_embeddings via Cerebro #9) y BrandHealthContextService (chunks intent-aware desde brand_health_embeddings via Data Sync #10). Una sola llamada de embedding Vertex AI se comparte entre ambas busquedas, que corren en paralelo via Promise.allSettled. La interfaz IContextAssembler formaliza el patron existente de RagOrchestrator. IContextWindowManager gestiona un presupuesto dinamico de tokens sobre el context window de 200K — calculando espacio disponible despues del system prompt, historial, definiciones de tools, y respuesta esperada. HyDE (Hypothetical Document Embedding): cuando la query del usuario es corta o ambigua, el sistema puede primero generar una respuesta hipotetica internamente, luego usar esa respuesta como vector de busqueda en lugar de la pregunta — esto encuentra mejores resultados porque la KB contiene respuestas y explicaciones, no preguntas, asi que un vector de una respuesta hipotetica es semanticamente mas cercano a los documentos almacenados. La hipotetica nunca se muestra al usuario — solo se usa para mejorar la busqueda. Se activa selectivamente, no en cada query. Los datos en vivo del vendedor (metricas, inventario, productos) NO se pre-fetchean — el LLM los solicita on-demand via READ tools cuando los necesita.

Beautonomous governance: IContextWindowManager is ConfirmationFlow-aware — when a WRITE action is pending confirmation, context assembly includes the pending action details so the LLM presents them accurately to the seller. The seller's consent is informed, not blind.Governance de Beautonomous: IContextWindowManager tiene conciencia del ConfirmationFlow — cuando una acción WRITE está pendiente de confirmación, el ensamblado de contexto incluye los detalles de la acción pendiente para que el LLM los presente con precisión al vendedor. El consentimiento del vendedor es informado, no ciego.

RagOrchestrator
IContextAssembler — assembles KB + Brand HealthIContextAssembler — ensambla KB + Brand Health
KBContextProvider
BigQuery kb_embeddings, semantic similarityBigQuery kb_embeddings, similitud semantica
BrandHealthContextService
Intent-aware, brand_health_embeddingsIntent-aware, brand_health_embeddings
ContextWindowManager
IContextWindowManager — dynamic token budgetIContextWindowManager — presupuesto dinamico de tokens
EmbeddingClient
Vertex AI text-embedding-004, 1 call per queryVertex AI text-embedding-004, 1 llamada por query
HyDEGenerator
Hypothetical answer for search (selective)Respuesta hipotetica para busqueda (selectivo)

Current StateEstado Actual

OperationalOperacional

KB RAG pipeline (embedding + BigQuery vector search + prompt injection). Brand Health intent-aware context (advertising, inventory, organic, financial intents). Parallel execution with shared embedding. Graceful degradation — if KB or Brand Health fails, Coach responds with remaining context.Pipeline KB RAG (embedding + busqueda vectorial BigQuery + inyeccion en prompt). Contexto Brand Health intent-aware (intents: advertising, inventory, organic, financial). Ejecucion paralela con embedding compartido. Degradacion graciosa — si KB o Brand Health falla, Coach responde con contexto restante.

Needs FormalizationNecesita Formalizacion

IContextAssembler interface — RagOrchestrator already implements the pattern but lacks a formal interface contract (Phase 0.2). Explicit ContextSource type for error tracking and sourcesFailed reporting.Interfaz IContextAssembler — RagOrchestrator ya implementa el patron pero carece de un contrato formal de interfaz (Fase 0.2). Tipo ContextSource explicito para tracking de errores y reporte de sourcesFailed.

To BuildPor Construir

IContextWindowManager interface + ContextWindowManager implementation (Phase 0.3). Token budget calculation: available = 200K - system - history - tools - 4000. toolDefinitionsTokens accounting (~1450 tokens for 36 tools, Phase 1). Dynamic toolResultsTokens in ReAct loop (Phase 2+). HyDE (Hypothetical Document Embedding): selective generation of hypothetical answers for search when query is short/ambiguous — requires additional LLM call pre-search (Phase 2+).Interfaz IContextWindowManager + implementacion ContextWindowManager (Fase 0.3). Calculo de presupuesto de tokens: available = 200K - system - history - tools - 4000. Contabilizacion de toolDefinitionsTokens (~1450 tokens para 36 tools, Fase 1). toolResultsTokens dinamicos en loop ReAct (Fase 2+). HyDE (Hypothetical Document Embedding): generacion selectiva de respuestas hipoteticas para busqueda cuando la query es corta/ambigua — requiere llamada LLM adicional pre-busqueda (Fase 2+).

Scope Boundaries — What this layer does NOT doLimites de Alcance — Lo que esta capa NO hace

UserProfile + critical alerts — These belong to SystemPromptComposer (#4) Layer 2, not this layer. The context aggregator only handles KB and Brand Health RAG chunks in the user prompt.UserProfile + alertas criticas — Pertenecen a SystemPromptComposer (#4) Layer 2, no a esta capa. El context aggregator solo maneja chunks RAG de KB y Brand Health en el user prompt.
Live seller data — Metrics, inventory, products, and other live data are NOT pre-fetched into context. The LLM requests them on-demand via READ tools (#3) when needed. This avoids wasting tokens on data the user didn't ask about.Datos en vivo del vendedor — Metricas, inventario, productos y otros datos en vivo NO se pre-fetchean al contexto. El LLM los solicita on-demand via READ tools (#3) cuando los necesita. Esto evita desperdiciar tokens en datos que el usuario no pregunto.
Conversation history — Managed by ConversationRepository + AgentLoopOrchestrator (#2). ContextWindowManager only accounts for history tokens in its budget calculation, it does not store or retrieve messages.Historial de conversacion — Gestionado por ConversationRepository + AgentLoopOrchestrator (#2). ContextWindowManager solo contabiliza los tokens del historial en su calculo de presupuesto, no almacena ni recupera mensajes.
Proactive suggestions — Triggered by after_tool hook in HookLifecycle (#3), not pre-loaded context. Suggestions are event-driven, not aggregated into the prompt.Sugerencias proactivas — Activadas por after_tool hook en HookLifecycle (#3), no contexto pre-cargado. Las sugerencias son event-driven, no agregadas al prompt.

Tech Stack (TypeScript — BigQuery + Vertex AI)Stack Tecnologico (TypeScript — BigQuery + Vertex AI)

TypeScript 5+ BigQuery (vector search) Vertex AI (text-embedding-004) Promise.allSettled
Data Models, API Signatures & Acceptance Criteria Modelos de Datos, APIs & Criterios de Aceptación
Data ModelsModelos de Datos
interface IContextAssembler {
  assemble(request: ContextAssemblyRequest): Promise<AssembledContext>
}
interface ContextAssemblyRequest {
  userId: string; marketplace: Marketplace; query: string; conversationId: string
}
interface AssembledContext {
  kbChunks: KBChunkModel[]; brandHealthChunks: BrandHealthChunkType[]
  estimatedTokens: number; sourcesFailed: ContextSource[]
}
type ContextSource = 'kb' | 'brand_health'

interface IContextWindowManager {
  allocate(available: TokenBudget): ContextAllocation
  trim(context: AssembledContext, allocation: ContextAllocation): TrimmedContext
}
interface TokenBudget {
  modelContextWindow: number      // 200K
  systemPromptTokens: number; historyTokens: number
  toolDefinitionsTokens: number; expectedResponseTokens: number  // 4000
}
interface ContextAllocation {
  kbChunksTokens: number; brandHealthTokens: number; toolResultsTokens: number
}
interface TrimmedContext {
  kbChunks: KBChunkModel[]; brandHealthChunks: BrandHealthChunkType[]; truncated: boolean
}
API SignaturesFirmas de API
// Internal invocation by AgentLoopOrchestrator — NO REST endpoint

// IContextAssembler (RagOrchestrator implements)
assemble(request: ContextAssemblyRequest): Promise<AssembledContext>

// IContextWindowManager (ContextWindowManager implements)
allocate(budget: TokenBudget): ContextAllocation
trim(context: AssembledContext, allocation: ContextAllocation): TrimmedContext

// Flow:
// 1. AgentLoopOrchestrator calls assemble() at start of each turn
// 2. assemble() embeds query once, runs KB + BrandHealth searches in parallel
// 3. allocate() calculates: available = 200K - system - history - tools - 4000
// 4. trim() fits assembled context within allocation, truncating if needed
Acceptance CriteriaCriterios de Aceptación
  • assemble() retrieves top-K KB chunks by semantic similarity from BigQuery kb_embeddings
  • Brand health chunks filtered by detected intent (advertising, inventory, organic, financial)
  • Single embedding call, two parallel searches via Promise.allSettled
  • If KB or brand health fails, Coach responds with remaining context (graceful degradation invariant)
  • ContextWindowManager calculates: available = 200K - system - history - tools - 4000
  • Trimming priority: brand health first, then KB, then old tool results
  • [Ph 2+] HyDE: when query is short/ambiguous, system generates hypothetical answer and uses it as search vector instead of the raw question. Hypothetical never shown to user. Activated selectively based on query characteristics
  • assemble() recupera top-K chunks de KB por similitud semantica desde BigQuery kb_embeddings
  • Chunks de brand health filtrados por intent detectado (advertising, inventory, organic, financial)
  • Una sola llamada de embedding, dos busquedas paralelas via Promise.allSettled
  • Si KB o brand health falla, Coach responde con contexto restante (invariante de degradacion graciosa)
  • ContextWindowManager calcula: available = 200K - system - history - tools - 4000
  • Prioridad de recorte: brand health primero, luego KB, luego tool results antiguos
  • [Ph 2+] HyDE: cuando la query es corta/ambigua, el sistema genera una respuesta hipotetica y la usa como vector de busqueda en lugar de la pregunta cruda. La hipotetica nunca se muestra al usuario. Se activa selectivamente segun caracteristicas de la query

How It WorksComo Funciona

AgentLoopOrchestrator — start of turn
        |
        +-- SystemPromptComposer.compose()    → system prompt blocks
        |
        v
ContextAssembler.assemble(query, userId, marketplace)
        |
        +-- EmbeddingClient.embed(query)      ← 1 call, shared result
        |           |
        |     +-----+-----+
        |     |           |
        |  KB search   BrandHealth search    ← parallel (Promise.allSettled)
        |     |           |
        |     +-----------+
        |
        v
ContextWindowManager.allocate(budget)
        |
        v
ContextWindowManager.trim(assembledContext, allocation)
        |
        v
AssembledContext { kbChunks[], brandHealthChunks[], truncated }
        |
        v
buildUserPrompt(query, kbChunks, brandHealthChunks)  → string for LLM
            

At the start of each turn, AgentLoopOrchestrator calls ContextAssembler.assemble(). The assembler embeds the user's query once via EmbeddingClient (Vertex AI text-embedding-004), then runs two parallel searches: KBContextProvider against BigQuery kb_embeddings and BrandHealthContextService against brand_health_embeddings (filtered by detected intent). Both searches use Promise.allSettled so a failure in one does not block the other. ContextWindowManager then calculates the available token budget (200K minus system prompt, history, tool definitions, and 4000 reserved for response) and trims the assembled context to fit. The final kbChunks and brandHealthChunks are injected into the user prompt by buildUserPrompt().Al inicio de cada turno, AgentLoopOrchestrator llama a ContextAssembler.assemble(). El assembler genera el embedding de la query del usuario una vez via EmbeddingClient (Vertex AI text-embedding-004), luego ejecuta dos busquedas paralelas: KBContextProvider contra BigQuery kb_embeddings y BrandHealthContextService contra brand_health_embeddings (filtrado por intent detectado). Ambas busquedas usan Promise.allSettled para que un fallo en una no bloquee a la otra. ContextWindowManager luego calcula el presupuesto de tokens disponible (200K menos system prompt, historial, definiciones de tools, y 4000 reservados para respuesta) y recorta el contexto ensamblado para que quepa. Los kbChunks y brandHealthChunks finales se inyectan en el user prompt por buildUserPrompt().

Implementation PlanPlan de Implementacion

Phase 0.2: Formalize IContextAssembler (Immediate)Fase 0.2: Formalizar IContextAssembler (Inmediata)

Extract IContextAssembler interface from existing RagOrchestrator. Define ContextAssemblyRequest and AssembledContext types. RagOrchestrator implements IContextAssembler — no functional change, only formalization of the existing pattern. Add ContextSource type and sourcesFailed tracking.Extraer interfaz IContextAssembler del RagOrchestrator existente. Definir tipos ContextAssemblyRequest y AssembledContext. RagOrchestrator implementa IContextAssembler — sin cambio funcional, solo formalizacion del patron existente. Agregar tipo ContextSource y tracking de sourcesFailed.

Phase 0.3: ContextWindowManager (Basic Budget)Fase 0.3: ContextWindowManager (Presupuesto Basico)

Define IContextWindowManager interface. Implement ContextWindowManager with allocate() and trim(). Budget calculation: available = 200K - systemPromptTokens - historyTokens - expectedResponseTokens (4000). Trimming priority: brand health chunks first, then KB chunks (by lowest similarity score). Integration with AgentLoopOrchestrator to pass token counts.Definir interfaz IContextWindowManager. Implementar ContextWindowManager con allocate() y trim(). Calculo de presupuesto: available = 200K - systemPromptTokens - historyTokens - expectedResponseTokens (4000). Prioridad de recorte: chunks de brand health primero, luego chunks de KB (por menor score de similitud). Integracion con AgentLoopOrchestrator para pasar conteos de tokens.

Phase 1: Tool Definitions in Budget (~1450 tokens for 36 tools)Fase 1: Definiciones de Tools en Presupuesto (~1450 tokens para 36 tools)

Add toolDefinitionsTokens to TokenBudget. Calculate tool schema token cost from Tool Registry (#3) — ~50 tokens per tool average, ~1450 total for 36 tools. Include in allocate() budget subtraction. Validate that RAG context still fits comfortably after tool definitions are accounted for.Agregar toolDefinitionsTokens a TokenBudget. Calcular costo de tokens de esquemas de tools del Tool Registry (#3) — ~50 tokens por tool promedio, ~1450 total para 36 tools. Incluir en sustraccion del presupuesto de allocate(). Validar que el contexto RAG aun cabe comodamente despues de contabilizar las definiciones de tools.

Phase 2+: Dynamic Tool Results in ReAct LoopFase 2+: Tool Results Dinamicos en Loop ReAct

Add toolResultsTokens to ContextAllocation — dynamically tracks accumulated tool_result tokens across ReAct rounds. ContextWindowManager recalculates available budget each round as tool results accumulate. Trim strategy: oldest tool results first (preserve most recent), then brand health, then KB. MAX_ROUNDS guard (10 steps) prevents unbounded context growth. HyDE (Hypothetical Document Embedding): when the user's query is short or ambiguous, the system generates a hypothetical answer via a lightweight LLM call and uses that answer as the search vector instead of the raw question. The KB contains answers and explanations, not questions — a vector from a hypothetical answer is semantically closer to the stored documents, finding better results. The hypothetical is never shown to the user. Activated selectively, not on every query — adds one LLM call of latency when used.Agregar toolResultsTokens a ContextAllocation — rastrea dinamicamente tokens de tool_result acumulados entre rondas ReAct. ContextWindowManager recalcula presupuesto disponible cada ronda a medida que se acumulan tool results. Estrategia de recorte: tool results mas antiguos primero (preservar los mas recientes), luego brand health, luego KB. Guard MAX_ROUNDS (10 pasos) previene crecimiento ilimitado de contexto. HyDE (Hypothetical Document Embedding): cuando la query del usuario es corta o ambigua, el sistema genera una respuesta hipotetica via una llamada LLM ligera y usa esa respuesta como vector de busqueda en lugar de la pregunta cruda. La KB contiene respuestas y explicaciones, no preguntas — un vector de una respuesta hipotetica es semanticamente mas cercano a los documentos almacenados, encontrando mejores resultados. La hipotetica nunca se muestra al usuario. Se activa selectivamente, no en cada query — agrega una llamada LLM de latencia cuando se usa.

Risk AnalysisAnalisis de Riesgos

Tool results fill context window in long sessionsTool results llenan context window en sesiones largas

Impact: MediumImpacto: Medio

Mitigation: ContextWindowManager reserves dynamic budget per round, trims brand health first, then KB, then oldest tool results. MAX_ROUNDS guard (10 steps) caps ReAct loop length. expectedResponseTokens (4000) always reserved.Mitigacion: ContextWindowManager reserva presupuesto dinamico por ronda, recorta brand health primero, luego KB, luego tool results mas antiguos. Guard MAX_ROUNDS (10 pasos) limita longitud del loop ReAct. expectedResponseTokens (4000) siempre reservado.

KB / Brand Health return irrelevant resultsKB / Brand Health devuelven resultados irrelevantes

Impact: MediumImpacto: Medio

Mitigation: Namespace filter (Phase 0.4) scopes BigQuery search when intent is clear. Intent detection selects relevant brand health subset. Trim removes lowest-similarity chunks first. Irrelevant context wastes tokens but does not cause errors.Mitigacion: Filtro de namespace (Fase 0.4) limita busqueda BigQuery cuando el intent es claro. Deteccion de intent selecciona subconjunto relevante de brand health. Trim remueve chunks de menor similitud primero. Contexto irrelevante desperdicia tokens pero no causa errores.

Embedding call takes longer than expectedLlamada de embedding tarda mas de lo esperado

Impact: Low-MediumImpacto: Bajo-Medio

Mitigation: Configurable timeout on EmbeddingClient. If embedding fails, Coach responds without RAG context — graceful degradation invariant ensures the user always gets a response, even without KB or brand health chunks.Mitigacion: Timeout configurable en EmbeddingClient. Si el embedding falla, Coach responde sin contexto RAG — el invariante de degradacion graciosa asegura que el usuario siempre recibe una respuesta, incluso sin chunks de KB o brand health.

Key DecisionsDecisiones Clave

D1.

Live data via tools, not pre-fetch — Seller metrics, inventory, and products are requested on-demand by the LLM via READ tools. Pre-fetching wastes tokens on data the user didn't ask about. The LLM decides what data it needs based on the user's query.Datos en vivo via tools, no pre-fetch — Metricas, inventario y productos del vendedor se solicitan on-demand por el LLM via READ tools. Pre-fetchear desperdicia tokens en datos que el usuario no pregunto. El LLM decide que datos necesita basado en la query del usuario.

D2.

Single embedding, two parallel searches — One Vertex AI embedding call per query, shared across KB and Brand Health searches. Promise.allSettled ensures both searches run in parallel and one failure doesn't block the other. Reduces latency and cost vs. separate embedding calls.Un solo embedding, dos busquedas paralelas — Una llamada de embedding Vertex AI por query, compartida entre busquedas de KB y Brand Health. Promise.allSettled asegura que ambas busquedas corren en paralelo y un fallo no bloquea a la otra. Reduce latencia y costo vs. llamadas de embedding separadas.

D3.

ContextWindowManager as separate service — Token budget management is decoupled from the orchestrator. The orchestrator passes token counts, ContextWindowManager calculates allocation and trims. This separation allows independent evolution of budget strategies without touching the orchestration logic.ContextWindowManager como servicio separado — La gestion de presupuesto de tokens esta desacoplada del orquestador. El orquestador pasa conteos de tokens, ContextWindowManager calcula asignacion y recorta. Esta separacion permite evolucion independiente de estrategias de presupuesto sin tocar la logica de orquestacion.

D4.

Graceful degradation as invariant — Never return an error to the user because of a context assembly failure. If embedding fails, respond without RAG. If KB fails, respond with Brand Health only (and vice versa). If both fail, respond with no RAG context. The user always gets a response.Degradacion graciosa como invariante — Nunca retornar un error al usuario por un fallo en el ensamblaje de contexto. Si el embedding falla, responder sin RAG. Si KB falla, responder solo con Brand Health (y viceversa). Si ambos fallan, responder sin contexto RAG. El usuario siempre recibe una respuesta.

File StructureEstructura de Archivos

src/domain/common/services/
    IContextAssembler.ts
    IContextWindowManager.ts
src/application/common/services/
    RagOrchestrator.ts          (implements IContextAssembler)
    ContextWindowManager.ts     (implements IContextWindowManager)

MVP Scope

KB RAG + Brand Health intent-aware + parallel execution + graceful degradation. IContextAssembler formalized (Phase 0.2), ContextWindowManager basic (Phase 0.3). KB RAG + Brand Health intent-aware + ejecucion paralela + degradacion graciosa. IContextAssembler formalizado (Fase 0.2), ContextWindowManager basico (Fase 0.3).

SourceFuente

RagOrchestrator + BrandHealthContextService RagOrchestrator + BrandHealthContextService

Source:Fuente: RagOrchestrator + BrandHealthContextService | Depends on:Depende de: #10 (brand health embeddings), #9 (KB embeddings)
📝 Project ChangelogChangelog del Proyecto
v4 Mar 2, 2026
~Architecture: 5 providers → 2 automatic sources (KB + Brand Health)Arquitectura: 5 providers → 2 fuentes automaticas (KB + Brand Health)
~Stack: Python/FastAPI → TypeScript/BigQuery/Vertex AIStack: Python/FastAPI → TypeScript/BigQuery/Vertex AI
+IContextAssembler + IContextWindowManager interfacesInterfaces IContextAssembler + IContextWindowManager
~Context budget: fixed 2000 tokens → dynamic 200K model context windowPresupuesto de contexto: 2000 tokens fijos → 200K ventana de contexto dinamica
+HyDE (Hypothetical Document Embedding) for short/ambiguous queriesHyDE (Hypothetical Document Embedding) para queries cortas/ambiguas
+New sections: Current State, Scope Boundaries, File StructureSecciones nuevas: Estado Actual, Limites de Scope, Estructura de Archivos
~Dependencies: #10,#3,#9,#6 → #10,#9Dependencias: #10,#3,#9,#6 → #10,#9
v3 Feb 27-28, 2026
+Full deep spec card: 5 providers, Python/FastAPI, REST endpoint, fixed 2000-token limitCard deep spec completa: 5 providers, Python/FastAPI, endpoint REST, limite fijo 2000 tokens
v2 Feb 27, 2026
+Stale definition: last sync > 2 hours triggers 1 real-time callDefinicion de stale: ultimo sync > 2 horas dispara 1 llamada real-time
v1 Feb 26, 2026
+Initial — “Context Aggregator” (v1 #10), ADAPT statusInicial — “Context Aggregator” (v1 #10), estado ADAPT
#6

Proactive Suggestions Engine

Intelligence — Mateo

NEW

Next Best Action suggestions via LLM evaluation — not hardcoded rules. After every tool call, the after_tool hook in HookLifecycle (#3) triggers IProactiveSuggestionService.afterTool(). The LLM receives the tool result, brand health summary, and conversation context, then evaluates whether there's something actionable the seller should consider. No thresholds in code — the LLM reasons contextually about what matters. Suggestions are ephemeral text appended to the Coach response as conversational questions (not sidebar cards, not a separate UI). Deduplication is cross-session via UserProfile.recentSuggestions[] with a 7-day window per (suggestionType, productId). The service is generic per tool — adding a new tool to the catalog requires zero changes. [v2.1] Pro plan only — Free users don't receive proactive suggestions (gated at API Gateway). Sugerencias Next Best Action via evaluacion LLM — no reglas hardcodeadas. Despues de cada tool call, el hook after_tool en HookLifecycle (#3) activa IProactiveSuggestionService.afterTool(). El LLM recibe el resultado de la tool, resumen de brand health y contexto de conversacion, y evalua si hay algo accionable que el vendedor deberia considerar. Sin umbrales en codigo — el LLM razona contextualmente sobre que importa. Las sugerencias son texto efimero agregado a la respuesta del Coach como preguntas conversacionales (no tarjetas en sidebar, no UI separada). La deduplicacion es cross-session via UserProfile.recentSuggestions[] con ventana de 7 dias por (suggestionType, productId). El servicio es generico por tool — agregar una tool nueva al catalogo requiere cero cambios. [v2.1] Solo plan Pro — usuarios Free no reciben sugerencias proactivas (gateado en API Gateway).

Beautonomous governance: proactive suggestions respect Core's permission matrix — suggestions implying WRITE actions are only surfaced to roles with write authorization. Pro-plan gating at API Gateway aligns with Core's resource allocation tiers. No suggestion can bypass Core's ConfirmationFlow if the seller acts on it.Governance de Beautonomous: las sugerencias proactivas respetan la matriz de permisos de Core — las sugerencias que implican acciones WRITE solo se muestran a roles con autorización de escritura. El gate de plan Pro en API Gateway se alinea con los tiers de asignación de recursos de Core. Ninguna sugerencia puede evitar el ConfirmationFlow de Core si el vendedor actúa sobre ella.

IProactiveSuggestionService
Domain port — afterTool() + afterToolWithContext()Puerto de dominio — afterTool() + afterToolWithContext()
ProactiveSuggestionService
LLM evaluation + deduplicationEvaluacion LLM + deduplicacion
SuggestionInput
Tool result + brand health + contextResultado de tool + brand health + contexto
isDuplicate()
7-day window per (type, productId)Ventana 7 dias por (tipo, productId)
UserProfile.recentSuggestions
Cross-session persistence (DynamoDB)Persistencia cross-session (DynamoDB)

Current StateEstado Actual

Nothing ExistsNada Existe

ProactiveSuggestionService does not exist in the codebase. The Coach responds to user questions and terminates — no post-tool evaluation, no suggestions appended to responses.ProactiveSuggestionService no existe en el codebase. El Coach responde preguntas del usuario y termina — sin evaluacion post-tool, sin sugerencias agregadas a respuestas.

Blocked By PrerequisitesBloqueado por Prerequisitos

ReAct Loop (Phase 0.3) — without the loop there are no tool calls. HookLifecycle (Phase 1) — after_tool hook is the only activation point for this service. Both must be operational before #6 can start.Loop ReAct (Fase 0.3) — sin el loop no hay tool calls. HookLifecycle (Fase 1) — el hook after_tool es el unico punto de activacion de este servicio. Ambos deben estar operacionales antes de que #6 pueda iniciar.

To BuildPor Construir

IProactiveSuggestionService interface + types (Phase 2). ProactiveSuggestionService with LLM evaluation + dedup (Phase 2). UserProfile.recentSuggestions[] extension (Phase 2). afterToolWithContext() with full session context + parallel streaming (Phase 4).Interfaz IProactiveSuggestionService + tipos (Fase 2). ProactiveSuggestionService con evaluacion LLM + dedup (Fase 2). Extension UserProfile.recentSuggestions[] (Fase 2). afterToolWithContext() con contexto completo de sesion + streaming paralelo (Fase 4).

Scope Boundaries — What this layer does NOT doLimites de Alcance — Lo que esta capa NO hace

Define business thresholds — The Coach does NOT hardcode "health_score < 70 is bad" or "a 20% sales drop is significant." That criterion lives in the LLM's training. If business thresholds change or vary by marketplace, there's no code to touch.Definir umbrales de negocio — El Coach NO hardcodea "health_score < 70 es malo" o "una caida de 20% en ventas es significativa." Ese criterio vive en el entrenamiento del LLM. Si los umbrales de negocio cambian o varian por marketplace, no hay codigo que tocar.
Autonomous background analysis — No cron analyzes all users in background. Suggestions are only generated in active conversations, after the LLM has just retrieved real seller data via a tool call. No data pull without a conversation.Analisis autonomo en background — Ningun cron analiza todos los usuarios en background. Las sugerencias solo se generan en conversaciones activas, despues de que el LLM acaba de recuperar datos reales del vendedor via tool call. Sin data pull sin conversacion.
Suggestion lifecycle management — No table of suggestions with pending/seen/acted/dismissed states. No REST API for managing suggestions. No sidebar cards. Suggestions are text in the Coach response — ephemeral by nature.Gestion de ciclo de vida de sugerencias — Sin tabla de sugerencias con estados pending/seen/acted/dismissed. Sin REST API para gestionar sugerencias. Sin tarjetas en sidebar. Las sugerencias son texto en la respuesta del Coach — efimeras por naturaleza.
Plan gating decision — If Free users should not receive suggestions, that's decided by the API Gateway or auth middleware before reaching the Lambda. Not this service's responsibility.Decision de gating por plan — Si los usuarios Free no deben recibir sugerencias, eso lo decide el API Gateway o el middleware de auth antes de llegar al Lambda. No es responsabilidad de este servicio.

Tech Stack (TypeScript — LLM Structured Output)Stack Tecnologico (TypeScript — LLM Structured Output)

TypeScript 5+ Anthropic Claude (structured output) DynamoDB (UserProfile) HookLifecycle (after_tool)
Data Models, API Signatures & Acceptance Criteria Modelos de Datos, APIs & Criterios de Aceptación
Data ModelsModelos de Datos
interface IProactiveSuggestionService {
  afterTool(input: SuggestionInput): Promise<Suggestion[]>           // Phase 2
  afterToolWithContext(input: SuggestionInput): Promise<Suggestion[]> // Phase 4
}

interface SuggestionInput {
  toolName: string
  result: ToolResult
  userId: string
  marketplace: Marketplace
  brandHealthSummary?: HealthSummary
  conversationContext: Message[]        // last N messages
  recentSuggestions: RecentSuggestion[] // for deduplication
}

interface Suggestion {
  message: string                       // conversational question, inviting tone
  priority: 'high' | 'normal'
  suggestionType: string                // dedup key: "stock", "pricing", "health", "reviews", etc.
  productId?: string                    // if applies to a specific product
}

interface RecentSuggestion {
  suggestionType: string
  productId?: string
  suggestedAt: string                   // ISO8601 — for 7-day window
}
// Stored in UserProfile.recentSuggestions[] (DynamoDB, last 30 entries)
API SignaturesFirmas de API
// Internal invocation via HookLifecycle after_tool — NO REST endpoint

// Phase 2: lightweight LLM evaluation post-tool
afterTool(input: SuggestionInput): Promise<Suggestion[]>

// Phase 4: full session context + parallel to streaming
afterToolWithContext(input: SuggestionInput): Promise<Suggestion[]>

// Flow:
// 1. ToolExecutor completes tool → HookLifecycle.afterTool fires
// 2. ProactiveSuggestionService.afterTool() sends tool result to LLM
// 3. LLM returns structured JSON: { hasSuggestion, message, suggestionType, priority }
// 4. isDuplicate(suggestionType, productId, recentSuggestions) → skip if duplicate
// 5. Max 2 suggestions per turn, high priority first
// 6. Appended to Coach response as conversational questions
Acceptance CriteriaCriterios de Aceptación
  • afterTool() evaluates every tool result via LLM — no hardcoded rules or thresholds
  • LLM returns structured JSON with hasSuggestion, message, suggestionType, priority
  • Deduplication: same (suggestionType, productId) within 7 days = skip
  • Max 2 suggestions per turn — high priority first
  • If LLM output parse fails → silence (no error, Coach responds normally)
  • Suggestions are conversational questions, not prescriptive alerts
  • Adding a new tool to the catalog requires zero changes to this service
  • afterTool() evalua cada resultado de tool via LLM — sin reglas hardcodeadas ni umbrales
  • LLM retorna JSON estructurado con hasSuggestion, message, suggestionType, priority
  • Deduplicacion: mismo (suggestionType, productId) dentro de 7 dias = omitir
  • Maximo 2 sugerencias por turno — prioridad high primero
  • Si el parse del output LLM falla → silencio (sin error, Coach responde normalmente)
  • Las sugerencias son preguntas conversacionales, no alertas prescriptivas
  • Agregar una tool nueva al catalogo requiere cero cambios en este servicio

How It WorksComo Funciona

LLM executes tool (any: get_product, get_orders, get_market_pricing, ...)
        |
        v
HookLifecycle.after_tool
        |
        v
ProactiveSuggestionService.afterTool(toolName, result, brandHealth, context)
        |
        +-- LLM evaluates: "Is there something actionable in this result?"
        |   → { hasSuggestion, message, suggestionType, priority, productId }
        |
        +-- isDuplicate(suggestionType, productId, recentSuggestions)?
        |   Yes → skip
        |   No  → keep
        |
        +-- Sort by priority (high first)
        |
        +-- Take max 2
        |
        v
Suggestions[] → context.pendingSuggestions
        |
        v
AgentLoopOrchestrator builds final response:
  [main LLM response]
  [suggestions as conversational questions, if any]
        |
        v
User receives response + 0-2 questions at the end
            

After every tool execution, the after_tool hook in HookLifecycle (#3) fires ProactiveSuggestionService.afterTool(). The service sends the tool result, brand health summary, and recent conversation context to the LLM with a structured output prompt: "Is there something actionable the seller should consider?" The LLM returns JSON with hasSuggestion, message, suggestionType, priority, and optional productId. The service then checks isDuplicate() against UserProfile.recentSuggestions[] — if the same (suggestionType, productId) was suggested within 7 days, it's silently skipped. Remaining suggestions are sorted by priority (high first), capped at 2 per turn, and pushed to context.pendingSuggestions. The AgentLoopOrchestrator appends them as conversational questions at the end of the Coach response. If the LLM returns hasSuggestion: false or the parse fails, silence — no forced suggestion. The service never blocks the main response.Despues de cada ejecucion de tool, el hook after_tool en HookLifecycle (#3) dispara ProactiveSuggestionService.afterTool(). El servicio envia el resultado de la tool, resumen de brand health y contexto reciente de la conversacion al LLM con un prompt de output estructurado: "Hay algo accionable que el vendedor deberia considerar?" El LLM retorna JSON con hasSuggestion, message, suggestionType, priority, y productId opcional. El servicio luego verifica isDuplicate() contra UserProfile.recentSuggestions[] — si el mismo (suggestionType, productId) fue sugerido en los ultimos 7 dias, se omite silenciosamente. Las sugerencias restantes se ordenan por prioridad (high primero), se limitan a 2 por turno, y se agregan a context.pendingSuggestions. El AgentLoopOrchestrator las agrega como preguntas conversacionales al final de la respuesta del Coach. Si el LLM retorna hasSuggestion: false o el parse falla, silencio — sin sugerencia forzada. El servicio nunca bloquea la respuesta principal.

Implementation PlanPlan de Implementacion

Prerequisites: ReAct Loop (Phase 0.3) + HookLifecycle (Phase 1)Prerequisitos: Loop ReAct (Fase 0.3) + HookLifecycle (Fase 1)

The after_tool hook does not exist without the ReAct loop and HookLifecycle. Both must be operational before this service can be activated. Without tool calls, there's nothing to evaluate.El hook after_tool no existe sin el loop ReAct y HookLifecycle. Ambos deben estar operacionales antes de que este servicio pueda activarse. Sin tool calls, no hay nada que evaluar.

Phase 2: Lightweight LLM Evaluation Post-ToolFase 2: Evaluacion LLM Ligera Post-Tool

Define IProactiveSuggestionService interface in domain. Implement ProactiveSuggestionService with LLM evaluation prompt (structured JSON output: hasSuggestion, message, suggestionType, priority, productId). Implement isDuplicate() with 7-day window per (suggestionType, productId). Extend UserProfile to include recentSuggestions: RecentSuggestion[] (last 30 entries, DynamoDB). Wire into HookLifecycle.afterTool() — fire-and-forget. Max 2 suggestions per turn, high priority first.Definir interfaz IProactiveSuggestionService en dominio. Implementar ProactiveSuggestionService con prompt de evaluacion LLM (output JSON estructurado: hasSuggestion, message, suggestionType, priority, productId). Implementar isDuplicate() con ventana de 7 dias por (suggestionType, productId). Extender UserProfile para incluir recentSuggestions: RecentSuggestion[] (ultimos 30 registros, DynamoDB). Conectar a HookLifecycle.afterTool() — fire-and-forget. Maximo 2 sugerencias por turno, prioridad high primero.

Phase 4: Full Context + Parallel StreamingFase 4: Contexto Completo + Streaming Paralelo

Implement afterToolWithContext() — LLM sees all tool results from the session, full conversation history, and suggestion states (acted/ignored). Runs in parallel to response streaming — user already sees the response while the LLM evaluates if there's more to flag. Cross-tool pattern detection: connects results from multiple tools in the same session. Relevance gate calibrated to <40% of turns emitting a suggestion. Signal, not noise.Implementar afterToolWithContext() — el LLM ve todos los tool results de la sesion, historial completo de conversacion y estados de sugerencias (actuada/ignorada). Corre en paralelo al streaming de respuesta — el usuario ya ve la respuesta mientras el LLM evalua si hay algo mas que senalar. Deteccion de patrones cross-tool: conecta resultados de multiples tools en la misma sesion. Gate de relevancia calibrado a <40% de turnos con sugerencia emitida. Senal, no ruido.

Risk AnalysisAnalisis de Riesgos

LLM over-generates suggestionsLLM sobregenera sugerencias

Impact: HighImpacto: Alto

Mitigation: 7-day deduplication prevents repetition. Max 2 per turn limits volume. Phase 4 relevance gate calibrates emission rate. Acceptance metric: >30% of suggestions acted upon in Phase 2 — if below, the model is generating low-quality suggestions.Mitigacion: Deduplicacion de 7 dias previene repeticion. Maximo 2 por turno limita volumen. Gate de relevancia de Fase 4 calibra tasa de emision. Metrica de aceptación: >30% de sugerencias actuadas en Fase 2 — si esta debajo, el modelo esta generando sugerencias de baja calidad.

LLM fails to produce structured outputLLM no produce output estructurado

Impact: MediumImpacto: Medio

Mitigation: If JSON parse fails → silence (not error). The Coach still responds to the main question normally. ProactiveSuggestionService never blocks the response — it's an optional addition. Parse failures are logged for monitoring.Mitigacion: Si el parse de JSON falla → silencio (no error). El Coach sigue respondiendo a la pregunta principal normalmente. ProactiveSuggestionService nunca bloquea la respuesta — es una adicion opcional. Fallos de parse se registran para monitoreo.

Stale deduplication in long sessionsDeduplicacion stale en sesiones largas

Impact: LowImpacto: Bajo

Mitigation: Dedup is by (suggestionType, productId). Different product → evaluated normally. Same product + problem resolved + recurrence within 7 days is an acceptable edge case. Last 30 entries kept; older discarded on write.Mitigacion: Dedup es por (suggestionType, productId). Producto diferente → evaluado normalmente. Mismo producto + problema resuelto + recurrencia dentro de 7 dias es un edge case aceptable. Ultimos 30 registros mantenidos; los mas antiguos descartados al escribir.

Key DecisionsDecisiones Clave

D1.

Actionability criteria lives in the LLM, not in code — Business thresholds vary by context: marketplace, category, seller volume, session context. The LLM reasons over the full picture — a constant < 70 in code cannot. This makes the service domain-agnostic: works the same for MeLi, Amazon, or any future marketplace.El criterio de accionabilidad vive en el LLM, no en el codigo — Los umbrales de negocio varian por contexto: marketplace, categoria, volumen del vendedor, contexto de sesion. El LLM razona sobre el panorama completo — una constante < 70 en codigo no puede. Esto hace al servicio agnostico al dominio: funciona igual para MeLi, Amazon, o cualquier marketplace futuro.

D2.

Generic per tool — extensible by design — afterTool() receives the result of any tool. No tool-specific logic. Adding get_shipping_costs or get_advertising_metrics to the catalog requires zero changes to this service. Extensibility is in the generic contract, not in a list of cases.Generico por tool — extensible por diseno — afterTool() recibe el resultado de cualquier tool. Sin logica especifica por tool. Agregar get_shipping_costs o get_advertising_metrics al catalogo requiere cero cambios en este servicio. La extensibilidad esta en el contrato generico, no en una lista de casos.

D3.

suggestionType as dedup key, not toolName — (toolName, productId) would block any suggestion about that product for 7 days, even if the LLM detects a different problem next session. (suggestionType, productId) deduplicates by problem type — more precise.suggestionType como clave de dedup, no toolName — (toolName, productId) bloquearia cualquier sugerencia sobre ese producto por 7 dias, aunque en la siguiente sesion el LLM detecte un problema diferente. (suggestionType, productId) deduplica por tipo de problema — mas preciso.

D4.

Questions that invite, not alerts that prescribe — "Your stock is low and you should restock" is an alert. "This product has 3 units and sold 8 this week. Want to restock before running out?" is an invitation. The Coach proposes, never prescribes. This distinction determines if users perceive the service as useful or as noise.Preguntas que invitan, no alertas que prescriben — "Tu stock esta bajo y deberias reponer" es una alerta. "Este producto tiene 3 unidades y vendio 8 esta semana. Queres reponer antes de quedarte sin stock?" es una invitacion. El Coach propone, nunca prescribe. Esta distincion determina si los usuarios perciben el servicio como util o como ruido.

D5.

Silence if nothing actionable — The LLM can and should return hasSuggestion: false. No forced suggestions. The goal is keeping the signal-to-noise ratio high — one suggestion the user acts on is worth more than three they ignore.Silencio si no hay nada accionable — El LLM puede y debe retornar hasSuggestion: false. Sin sugerencias forzadas. El objetivo es mantener alta la relacion senal/ruido — una sugerencia en la que el usuario actua vale mas que tres que ignora.

File StructureEstructura de Archivos

src/domain/coach/services/
    IProactiveSuggestionService.ts     (interface + types)
src/application/coach/services/
    ProactiveSuggestionService.ts      (LLM evaluation + deduplication)

MVP Scope

IProactiveSuggestionService + LLM evaluation post-tool + dedup via UserProfile (Phase 2). afterToolWithContext() with full session context deferred to Phase 4. Pro plan only. IProactiveSuggestionService + evaluacion LLM post-tool + dedup via UserProfile (Fase 2). afterToolWithContext() con contexto completo de sesion diferido a Fase 4. Solo plan Pro.

SourceFuente

New — no ProactiveSuggestionService exists in codebase Nuevo — no existe ProactiveSuggestionService en el codebase

Source:Fuente: New — no code existsNuevo — no existe codigo | Depends on:Depende de: #2 (ReAct loop), #3 (HookLifecycle after_tool)
📝 Project ChangelogChangelog del Proyecto
v4 Mar 2, 2026
~Architecture: cron every 6h + 4 hardcoded rules → LLM post-tool evaluation via after_tool hookArquitectura: cron cada 6h + 4 reglas hardcodeadas → evaluacion LLM post-tool via after_tool hook
+IProactiveSuggestionService with afterTool() + afterToolWithContext()IProactiveSuggestionService con afterTool() + afterToolWithContext()
~Dedup: Suggestion Store (DynamoDB) → UserProfile.recentSuggestions[] (7-day window)Dedup: Suggestion Store (DynamoDB) → UserProfile.recentSuggestions[] (ventana 7 dias)
~Delivery: sidebar cards → ephemeral in Coach response (no persistent UI)Entrega: cards en sidebar → efimera en respuesta del Coach (sin UI persistente)
~Dependencies: #10,#12,#3 → #2,#3Dependencias: #10,#12,#3 → #2,#3
+New sections: Current State, Scope Boundaries, File StructureSecciones nuevas: Estado Actual, Limites de Scope, Estructura de Archivos
v3 Feb 27-28, 2026
+Full deep spec card: cron every 6h, 4 hardcoded rules with fixed thresholds, Suggestion Store in DynamoDBCard deep spec completa: cron cada 6h, 4 reglas hardcodeadas con umbrales fijos, Suggestion Store en DynamoDB
v2.1 Feb 27, 2026
~Pro-only feature — incentive for Free → Pro upgradeFeature solo Pro — incentivo para upgrade Free → Pro
v2 Feb 27, 2026
+Created as new project — proactive suggestions based on marketplace health rulesCreado como proyecto nuevo — sugerencias proactivas basadas en reglas de salud del marketplace
#7

Guardrails

Security — Mateo

NEW

Independent content validation layer for the Coach — two validation points, one before the LLM (InputGuard) and one after (OutputGuard). InputGuard detects prompt injection attempts and off-scope queries before they reach the LLM. OutputGuard detects data leaks (another user's data in the response) and dangerous content (from unsanitized tool outputs) before the response reaches the user. This is security validation, not business logic — the guardrails don't know what a good Coach answer looks like, they only know what a dangerous one looks like. Graceful degradation is an invariant: if a guard fails internally, it lets through — the guardrails never cut the service. Rejection messages are always friendly, never expose the technical reason, and always redirect to the Coach's domain. Capa independiente de validacion de contenido para el Coach — dos puntos de validacion, uno antes del LLM (InputGuard) y uno despues (OutputGuard). InputGuard detecta intentos de prompt injection y queries fuera de scope antes de que lleguen al LLM. OutputGuard detecta filtraciones de datos (datos de otro usuario en la respuesta) y contenido peligroso (de outputs de tools no sanitizados) antes de que la respuesta llegue al usuario. Esto es validacion de seguridad, no logica de negocio — los guardrails no saben como luce una buena respuesta del Coach, solo saben como luce una peligrosa. Degradacion graciosa es invariante: si un guard falla internamente, deja pasar — los guardrails nunca cortan el servicio. Los mensajes de rechazo son siempre amables, nunca exponen el motivo técnico, y siempre redirigen al dominio del Coach.

Beautonomous governance: Guardrails is Core's first enforcement point — InputGuard validates incoming requests against Core's security rules before the LLM processes them. A blocked input means no ConfirmationFlow is ever triggered and no WRITE is ever attempted. Security validation precedes all governance logic.Governance de Beautonomous: Guardrails es el primer punto de aplicación de Core — InputGuard valida las solicitudes entrantes contra las reglas de seguridad de Core antes de que el LLM las procese. Un input bloqueado significa que no se activa ningún ConfirmationFlow y ningún WRITE se intenta. La validación de seguridad precede a toda la lógica de governance.

IGuardService
Domain port — validateInput() + validateOutput()Puerto de dominio — validateInput() + validateOutput()
InputGuard
Pre-LLM: prompt injection + off-scopePre-LLM: prompt injection + off-scope
OutputGuard
Post-LLM: data leak + dangerous contentPost-LLM: data leak + contenido peligroso
GuardService
Coordinator: IGuardService → InputGuard + OutputGuardCoordinador: IGuardService → InputGuard + OutputGuard
LLMGuardChecker
Lightweight LLM classifier (Phase 2)Clasificador LLM ligero (Fase 2)
ViolationCategory
prompt_injection | off_scope | data_leak | dangerous_contentprompt_injection | off_scope | data_leak | dangerous_content

Current StateEstado Actual

ReusableReutilizable

AgentLoopOrchestrator already has the integration point (pre-loop and post-loop). ILLMClient for LLMGuardChecker. CloudWatch logging infrastructure for violation tracking.AgentLoopOrchestrator ya tiene el punto de integracion (pre-loop y post-loop). ILLMClient para LLMGuardChecker. Infraestructura de logging CloudWatch para tracking de violaciones.

To BuildPor Construir

IGuardService interface + types (GuardInput, GuardOutput, GuardResult, ViolationCategory). InputGuard with pattern matching (injection + off-scope). OutputGuard with data leak detection (userId comparison) + dangerous content (pattern matching). GuardService coordinator. LLMGuardChecker for ambiguous cases (Phase 2). Integration in AgentLoopOrchestrator pre/post loop.Interfaz IGuardService + tipos (GuardInput, GuardOutput, GuardResult, ViolationCategory). InputGuard con pattern matching (injection + off-scope). OutputGuard con deteccion de data leak (comparacion de userId) + contenido peligroso (pattern matching). Coordinador GuardService. LLMGuardChecker para casos ambiguos (Fase 2). Integracion en AgentLoopOrchestrator pre/post loop.

Not This ProjectNo Es Este Proyecto

Authentication/authorization (API Gateway + Memberstack). Request format validation (Zod schemas). Rate limiting (API Gateway / Billing #13). Response quality evaluation (Eval Suite #16). Hallucination detection (Orchestrator #2). Business logic.Autenticacion/autorizacion (API Gateway + Memberstack). Validacion de formato de request (schemas Zod). Rate limiting (API Gateway / Billing #13). Evaluacion de calidad de respuestas (Eval Suite #16). Deteccion de alucinaciones (Orchestrator #2). Logica de negocio.

Tech Stack (TypeScript — Security Layer)Stack Tecnologico (TypeScript — Capa de Seguridad)

TypeScript 5+ Pattern Matching (regex) Claude Haiku (LLMGuardChecker) CloudWatch (violation alerts)
Data Models, Interfaces & Acceptance Criteria Modelos de Datos, Interfaces & Criterios de Aceptación
Data ModelsModelos de Datos
interface IGuardService {
  validateInput(input: GuardInput): Promise<GuardResult>
  validateOutput(output: GuardOutput): Promise<GuardResult>
}

interface GuardInput {
  query: string
  userId: string
  sessionHistory: Message[]   // last N messages for context
  marketplace: Marketplace
}

interface GuardOutput {
  response: string
  userId: string
  toolResults?: ToolResult[]  // tool results from loop for data leak verification
}

interface GuardResult {
  passed: boolean
  reason?: string             // if failed: which violation detected
  category?: ViolationCategory
  rejectionMessage?: string   // friendly message for the user
}

type ViolationCategory =
  | 'prompt_injection'
  | 'off_scope'
  | 'data_leak'
  | 'dangerous_content'

// --- InputGuard: known injection patterns (first line — regex) ---
// "ignore previous instructions" / "ignora las instrucciones anteriores"
// "you are now" / "ahora eres" / "pretend you are"
// "forget everything" / "olvidate de todo"
// Nested delimiters: ---SYSTEM--- , [INST], <|system|>, <|im_start|>
// Role switching: "act as" / "actúa como si fueras"
// "your real instructions are" / "tus instrucciones reales son"

// --- Rejection messages — never expose the technical reason ---
// Prompt injection:
//   ✅ "Estoy aquí para ayudarte con tu actividad como vendedor.
//       ¿Hay algo sobre tus productos, ventas o métricas?"
//   ❌ "Tu query contiene instrucciones que intentan modificar el sistema."
// Off-scope:
//   ✅ "Para eso no tengo la información que necesitás.
//       Si tenés consultas sobre tu actividad como vendedor, puedo ayudarte."
//   ❌ "Tu pregunta está fuera del scope del Coach."
// Data leak / OutputGuard:
//   ✅ "No pude generar una respuesta útil. Intentá reformularla
//       o preguntame sobre tu actividad en el marketplace."
//   ❌ "Tu respuesta fue bloqueada por política de seguridad."
Integration FlowFlujo de Integracion
// In AgentLoopOrchestrator

async handle(query, context): Promise<CoachResponse> {

  // 1. InputGuard — pre-LLM
  const inputResult = await guardService.validateInput({
    query, userId: context.userId,
    sessionHistory: context.recentMessages,
    marketplace: context.marketplace
  })
  if (!inputResult.passed) {
    logger.warn('guard.input.rejected', { category, userId })
    return { message: inputResult.rejectionMessage, guardRejected: true }
  }

  // 2. Normal ReAct loop
  const response = await this.runReActLoop(query, context)

  // 3. OutputGuard — post-LLM
  const outputResult = await guardService.validateOutput({
    response: response.message, userId: context.userId,
    toolResults: response.toolResults
  })
  if (!outputResult.passed) {
    logger.error('guard.output.rejected', { category, userId })
    if (outputResult.category === 'data_leak')
      await alertService.critical('data_leak_detected', { userId })
    return { message: outputResult.rejectionMessage, guardRejected: true }
  }

  return response
}
Acceptance CriteriaCriterios de Aceptación
  • [Ph 1] InputGuard detects known prompt injection patterns via regex (~5-10ms, no LLM cost)
  • [Ph 1] InputGuard detects off-scope queries and returns friendly redirection message
  • [Ph 1] Rejection messages never expose the technical reason — attacker cannot determine why they were rejected
  • [Ph 1] If InputGuard fails internally, query passes through (graceful degradation invariant)
  • [Ph 2] LLMGuardChecker activates only when pattern matching has low confidence. If classifier confidence < 0.7, query passes (doubt favors the user)
  • [Ph 3] OutputGuard detects data leak: response mentions userId that is not the current user's
  • [Ph 3] Data leak triggers critical CloudWatch alert (not just log). OutputGuard detects dangerous content via pattern matching
  • [Ph 3] If OutputGuard fails internally, response passes through (graceful degradation invariant)
  • [Ph 1] InputGuard detecta patrones conocidos de prompt injection via regex (~5-10ms, sin costo LLM)
  • [Ph 1] InputGuard detecta queries fuera de scope y retorna mensaje amable de redireccion
  • [Ph 1] Mensajes de rechazo nunca exponen el motivo técnico — atacante no puede determinar por que fue rechazado
  • [Ph 1] Si InputGuard falla internamente, el query pasa (invariante de degradacion graciosa)
  • [Ph 2] LLMGuardChecker se activa solo cuando pattern matching tiene baja confianza. Si confianza del clasificador < 0.7, query pasa (la duda favorece al usuario)
  • [Ph 3] OutputGuard detecta data leak: respuesta menciona userId que no es del usuario actual
  • [Ph 3] Data leak dispara alerta critica de CloudWatch (no solo log). OutputGuard detecta contenido peligroso via pattern matching
  • [Ph 3] Si OutputGuard falla internamente, la respuesta pasa (invariante de degradacion graciosa)

Security layer · Not auth · 2 validation points · Graceful degradation invariant · 4 violation categories · 3 phases

How It WorksComo Funciona

User input (query)
        |
        v
InputGuard.validateInput(input)
        |
        +-- passed → continue to AgentLoopOrchestrator
        |
        +-- prompt injection detected
        |   → log CloudWatch (category: prompt_injection, userId)
        |   → return friendly redirection message
        |
        +-- off-scope detected
        |   → log CloudWatch (category: off_scope, userId)
        |   → return scope redirection suggestion
        |
        +-- guard fails internally → continue (degrade gracefully)
        |
        v
AgentLoopOrchestrator runs ReAct loop
(tools, RAG, generation)
        |
        v
LLM generates final response
        |
        v
OutputGuard.validateOutput(output)
        |
        +-- passed → return response to user
        |
        +-- data leak detected
        |   → log CloudWatch (category: data_leak, userId) — critical alert
        |   → return neutral message to user
        |
        +-- dangerous content detected
        |   → log CloudWatch (category: dangerous_content, userId)
        |   → return neutral message to user
        |
        +-- guard fails internally → return response unchanged (degrade gracefully)
        |
        v
User receives response (or friendly rejection message)
            

Two independent validation points wrap the ReAct loop. InputGuard runs before the LLM: first line is pattern matching (~5ms, detects known injection patterns and off-scope signals), second line (Phase 2) is a lightweight LLM classifier for ambiguous cases. If pattern matching has high confidence, no LLM call is needed. OutputGuard runs after the LLM: checks if the response mentions a userId different from the current user (data leak) and scans for dangerous patterns from unsanitized tool outputs. Both guards degrade gracefully — if they fail internally, the system continues normally. Rejection messages are always friendly and never reveal the technical reason, making bypass harder for attackers.Dos puntos de validacion independientes envuelven el loop ReAct. InputGuard corre antes del LLM: primera linea es pattern matching (~5ms, detecta patrones conocidos de injection y senales de off-scope), segunda linea (Fase 2) es un clasificador LLM ligero para casos ambiguos. Si el pattern matching tiene alta confianza, no se necesita llamada LLM. OutputGuard corre despues del LLM: verifica si la respuesta menciona un userId diferente al del usuario actual (data leak) y escanea patrones peligrosos de outputs de tools no sanitizados. Ambos guards degradan graciosamente — si fallan internamente, el sistema continua normalmente. Los mensajes de rechazo son siempre amables y nunca revelan el motivo técnico, haciendo mas dificil el bypass para atacantes.

Implementation Plan (3 Phases)Plan de Implementacion (3 Fases)

Phase 1: InputGuard with Pattern MatchingFase 1: InputGuard con Pattern Matching

IGuardService interface in domain + types (GuardInput, GuardOutput, GuardResult, ViolationCategory). InputGuard with regex pattern matching for known prompt injection patterns ("ignore previous instructions", "you are now", nested delimiters, role switching) and off-scope detection. GuardService coordinator. Integration in AgentLoopOrchestrator pre-loop. No perceptible latency (~5-10ms). No LLM cost.Interfaz IGuardService en dominio + tipos (GuardInput, GuardOutput, GuardResult, ViolationCategory). InputGuard con pattern matching regex para patrones conocidos de prompt injection ("ignora las instrucciones anteriores", "ahora eres", delimitadores anidados, role switching) y deteccion de off-scope. Coordinador GuardService. Integracion en AgentLoopOrchestrator pre-loop. Sin latencia perceptible (~5-10ms). Sin costo LLM.

Phase 2: LLM-as-Checker for Ambiguous CasesFase 2: LLM-as-Checker para Casos Ambiguos

LLMGuardChecker — lightweight LLM classifier for queries that pattern matching cannot classify with confidence. Only activates when pattern matching has low confidence — most legitimate seller queries pass without touching the LLM. Confidence threshold: < 0.7 means the query passes (doubt favors the user). Rejection rate metrics per category in CloudWatch. Latency: ~5ms (high confidence pass) to ~150-200ms (LLM check).LLMGuardChecker — clasificador LLM ligero para queries que el pattern matching no puede clasificar con confianza. Solo se activa cuando pattern matching tiene baja confianza — la mayoria de queries legitimos de vendedores pasan sin tocar el LLM. Umbral de confianza: < 0.7 significa que el query pasa (la duda favorece al usuario). Metricas de tasa de rechazo por categoria en CloudWatch. Latencia: ~5ms (pase alta confianza) a ~150-200ms (check LLM).

Phase 3: OutputGuard ActivatedFase 3: OutputGuard Activado

Data leak detection: extract userIds from tool results, compare against userIds in response — if response mentions a userId that's not the current user, it's a leak. Dangerous content detection via pattern matching (<script>, eval(, exec(, system delimiter injection). Critical CloudWatch alert on data_leak (not just log — real-time escalation). Integration in AgentLoopOrchestrator post-loop.Deteccion de data leak: extraer userIds de tool results, comparar contra userIds en la respuesta — si la respuesta menciona un userId que no es del usuario actual, es un leak. Deteccion de contenido peligroso via pattern matching (<script>, eval(, exec(, inyeccion de delimitadores de sistema). Alerta critica de CloudWatch en data_leak (no solo log — escalacion en tiempo real). Integracion en AgentLoopOrchestrator post-loop.

Risk AnalysisAnalisis de Riesgos

False positives reject legitimate queriesFalsos positivos rechazan queries legitimos

Impact: High — rejecting legitimate seller queries makes the Coach unusable for those cases.Impacto: Alto — rechazar queries legitimos de vendedores hace al Coach inutilizable para esos casos.

Mitigation: pattern matching uses very specific known injection patterns, not general "looks dangerous" heuristics. LLM classifier only acts on low confidence. If classifier confidence < 0.7, query passes. Doubt favors the user.Mitigacion: pattern matching usa patrones de injection conocidos muy especificos, no heuristicas generales de "parece peligroso". Clasificador LLM solo actua con baja confianza. Si confianza del clasificador < 0.7, query pasa. La duda favorece al usuario.

Sophisticated injection evades pattern matchingInjection sofisticada evade pattern matching

Impact: Low-Medium — advanced techniques (encoding, multi-language, unusual delimiters) may bypass the first line.Impacto: Bajo-Medio — técnicas avanzadas (encoding, multi-idioma, delimitadores inusuales) pueden evadir la primera linea.

Mitigation: second line (LLM-as-checker Phase 2) covers cases that pattern matching misses. The Coach also has its own system prompt anchoring it to its role — even if an injection passes both lines, the LLM is instructed to ignore instructions that contradict its Coach identity.Mitigacion: segunda linea (LLM-as-checker Fase 2) cubre casos que el pattern matching pierde. El Coach tambien tiene su propio system prompt que lo ancla a su rol — incluso si una injection pasa ambas lineas, el LLM esta instruido para ignorar instrucciones que contradigan su identidad de Coach.

OutputGuard adds latencyOutputGuard agrega latencia

Impact: Low — the Coach response already took several seconds. Adding 10ms of pattern matching is imperceptible.Impacto: Bajo — la respuesta del Coach ya tomo varios segundos. Agregar 10ms de pattern matching es imperceptible.

Mitigation: OutputGuard Phase 3 is pattern matching + string comparison — not LLM. Expected latency <10ms. If a future phase adds LLM-as-checker for output, impact is evaluated before activation.Mitigacion: OutputGuard Fase 3 es pattern matching + comparacion de strings — no LLM. Latencia esperada <10ms. Si una fase futura agrega LLM-as-checker para output, el impacto se evalua antes de activar.

Key DecisionsDecisiones Clave

D1.

Graceful degradation is invariant — If a guard fails internally, the system does not block. The user never sees an error due to a guardrail failure. Failures are logged to CloudWatch and monitored — but not propagated to the user. A guardrail that can cut the service is worse than no guardrail.Degradacion graciosa es invariante — Si un guard falla internamente, el sistema no bloquea. El usuario nunca ve un error por un fallo del guardrail. Los fallos se loggean en CloudWatch y se monitorean — pero no se propagan al usuario. Un guardrail que puede cortar el servicio es peor que no tener guardrail.

D2.

First line without LLM, second line only on low confidence — Pattern matching is fast and zero-cost. The LLM classifier only acts when pattern matching cannot determine with confidence. Most legitimate seller queries pass pattern matching without touching the secondary LLM.Primera linea sin LLM, segunda linea solo con baja confianza — Pattern matching es rapido y sin costo. El clasificador LLM solo actua cuando el pattern matching no puede determinar con confianza. La mayoria de queries legitimos de vendedores pasan el pattern matching sin tocar el LLM secundario.

D3.

Rejection messages without technical exposure — The user never knows if they were rejected for "prompt injection" or "off-scope". The message is always a friendly redirection. Exposing the technical reason facilitates attacker bypass.Mensajes de rechazo sin exposicion técnica — El usuario nunca sabe si fue rechazado por "prompt injection" o por "off-scope". El mensaje es siempre una redireccion amable. Exponer el motivo técnico facilita el bypass del atacante.

D4.

Security validation, not quality validation — Guardrails don't evaluate if the response is correct, useful, or aligned with the business. They only evaluate if it's safe. Quality is the responsibility of the Eval Suite (#16) and Hallucination Detection (#2).Validacion de seguridad, no de calidad — Los guardrails no evaluan si la respuesta es correcta, util o alineada con el negocio. Solo evaluan si es segura. La calidad es responsabilidad del Eval Suite (#16) y Hallucination Detection (#2).

D5.

Data leak is critical alert, not just log — A detected data leak (another user's data in the response) is not just a CloudWatch log — it's an alert that must reach the team in real time. The distinction matters for incident response time.Data leak es alerta critica, no solo log — Un data leak detectado (datos de otro usuario en la respuesta) no es solo un log de CloudWatch — es una alerta que debe llegar al equipo en tiempo real. La distincion importa para el tiempo de respuesta al incidente.

File StructureEstructura de Archivos

src/
  domain/
    coach/
      services/
        IGuardService.ts          ← interface + types (GuardInput, GuardOutput, GuardResult)
  application/
    coach/
      services/
        InputGuard.ts             ← pre-LLM validation (injection + off-scope)
        OutputGuard.ts            ← post-LLM validation (data leak + dangerous content)
        GuardService.ts           ← coordinator: IGuardService → InputGuard + OutputGuard
        LLMGuardChecker.ts        ← lightweight LLM classifier (Phase 2)

MVP Scope

Phase 1: InputGuard with pattern matching (injection + off-scope) + graceful degradation. Phase 2: LLMGuardChecker for ambiguous cases. Phase 3: OutputGuard (data leak + dangerous content). Fase 1: InputGuard con pattern matching (injection + off-scope) + degradacion graciosa. Fase 2: LLMGuardChecker para casos ambiguos. Fase 3: OutputGuard (data leak + contenido peligroso).

SourceFuente

New project — no existing source. Integrates into AgentLoopOrchestrator (#2). Proyecto nuevo — sin fuente existente. Se integra en AgentLoopOrchestrator (#2).

Source:Fuente: New projectProyecto nuevo | Depends on:Depende de: #2 (AgentLoopOrchestrator integration), #8 (CloudWatch alerts)
📝 Project ChangelogChangelog del Proyecto
v4 Mar 2, 2026
+New project created — InputGuard (pre-LLM) + OutputGuard (post-LLM)Proyecto nuevo creado — InputGuard (pre-LLM) + OutputGuard (post-LLM)
+IGuardService with graceful degradation as invariantIGuardService con degradacion graciosa como invariante
+Two-line defense: pattern matching first, LLM classifier secondDefensa en dos lineas: pattern matching primero, clasificador LLM segundo
+InputGuard: prompt injection detection + off-scope filteringInputGuard: deteccion de prompt injection + filtrado fuera de scope
+OutputGuard: data leak prevention + dangerous content filteringOutputGuard: prevencion de fuga de datos + filtrado de contenido peligroso
#8

Observability & Traceability

Observability — Mateo

EXISTS

Every agent interaction generates structured trace records across two independent persistence systems: a ConversationTrace in DynamoDB (lightweight, alongside the conversation, 90-day TTL) and a full AgentExecution in PostgreSQL (deep traceability with per-step logs, itemized costs, and context snapshots). Credits are calculated automatically by database triggers, never in the application. All tracking writes are fire-and-forget — if PostgreSQL is unavailable, the Coach responds normally. Without this, we operate blind — it's the difference between "having AI" and "operating AI well". Cada interaccion del agente genera registros de traza estructurados en dos sistemas de persistencia independientes: un ConversationTrace en DynamoDB (ligero, junto a la conversacion, TTL 90 dias) y un AgentExecution completo en PostgreSQL (trazabilidad profunda con logs por paso, costos itemizados y snapshots de contexto). Los creditos se calculan automaticamente por triggers de base de datos, nunca en la aplicacion. Todas las escrituras de tracking son fire-and-forget — si PostgreSQL no esta disponible, el Coach responde normalmente. Sin esto, operamos a ciegas — es la diferencia entre "tener IA" y "operar IA bien".

Beautonomous governance: dual-persistence audit trail implements Core Principle 3 (complete traceability) — every action taken by any role is recorded with actor identity, timestamp, and outcome. The audit log is the source of truth for governance accountability.Governance de Beautonomous: el audit trail de doble persistencia implementa el Principio 3 de Core (trazabilidad completa) — cada acción ejecutada por cualquier rol queda registrada con identidad del actor, timestamp y resultado. El audit log es la fuente de verdad para la responsabilidad de governance.

ConversationTrace
DynamoDB — per-message metricsDynamoDB — metricas por mensaje
AgentExecution
PostgreSQL — full execution tracePostgreSQL — traza de ejecucion completa
AgentLog
Ordered event timelineTimeline de eventos ordenados
AgentCost
Per-charge credits (trigger-calculated)Creditos por cargo (calculados por trigger)
ContextSnapshot
Exact context the LLM sawContexto exacto que vio el LLM
TrackingOrchestrator
Fire-and-forget write coordinatorCoordinador de escritura fire-and-forget

Tech Stack (in production)Stack Tecnologico (en produccion)

PostgreSQL (RDS/Cloud SQL) DynamoDB AWS Lambda CloudWatch SSM Parameter Store SQL Triggers

Activation: ENABLE_AGENT_TRACKING=true. Credentials from SSM Parameter Store (/AGENTS_ACTIVITY/{STAGE}/DB/*).Activacion: ENABLE_AGENT_TRACKING=true. Credenciales desde SSM Parameter Store (/AGENTS_ACTIVITY/{STAGE}/DB/*).

Data Models (7 PostgreSQL tables), Hierarchy & Acceptance Criteria Modelos de Datos (7 tablas PostgreSQL), Jerarquia & Criterios de Aceptación
Data HierarchyJerarquia de Datos
Client (seller — Memberstack ID)
  ↓ 1:N
AgentClient (conversation session — persists between messages)
  ↓ 1:N
AgentExecution (one execution = one user message)
  ↓ 1:N             ↓ 1:N              ↓ 1:N
AgentLog[]       AgentCost[]      ExecutionContextLink[]
(ordered event   (one per charge        ↓ N:1
 timeline)        type)           ContextSnapshot[]
                                  (kb_chunks, brand_health)
Core Data ModelsModelos de Datos Principales
# CoachTrace — what a single execution contains
CoachTrace
│── Identity
│   │── execution_id       UUID
│   │── user_id            Memberstack ID
│   │── conversation_id    DynamoDB conversation
│   └── marketplace        MercadoLibre / Amazon / etc.
│
│── Lifecycle
│   │── status             pending → running → done | error
│   │── started_at / ended_at
│   └── duration_ms        End-to-end latency
│
│── Pipeline Steps (AgentLog[])
│   │── embedding          latency, text size
│   │── vector_search      latency, chunks found, top score
│   │── brand_health       intent detected, metrics queried
│   │── llm_call           model, tokens in/out, latency
│   └── agent_error        error type, message, stack
│
│── Costs (AgentCost[])
│   │── EMBEDDING          1 credit per Vertex AI call
│   │── VECTOR_SEARCH      1 credit per BigQuery search
│   │── BRAND_HEALTH       1 credit per brand health query
│   └── TOKENS             CEIL((input+output)/1000) credits
│
└── Context (ContextSnapshot[])
    │── kb_chunks          Exact chunks the LLM saw
    └── brand_health       Brand health metrics injected

# Credits are NEVER calculated in the application —
# trigger trg_calculate_agent_credits computes on INSERT
Dual PersistencePersistencia Dual
│                  │ DynamoDB              │ PostgreSQL                       │
│ What it stores   │ ConversationTrace     │ AgentExecution+Logs+Costs+Snaps  │
│ Granularity      │ Aggregated pipeline   │ Per-step with latency+params     │
│ Access           │ Same table as chat    │ Separate DB, optional            │
│ TTL              │ 90 days (auto-delete) │ Configurable retention           │
│ Read latency     │ <10ms (hot store)     │ SQL ad-hoc queries               │
│ Activation       │ Always on             │ ENABLE_AGENT_TRACKING=true       │
Acceptance CriteriaCriterios de Aceptación
  • Each user message generates a ConversationTrace in DynamoDB and (if enabled) a full AgentExecution in PostgreSQL
  • AgentExecution contains at least one AgentLog of type rag_metrics with pipeline latencies
  • AgentCost reflects exactly the charge types used in that execution (EMBEDDING, VECTOR_SEARCH, BRAND_HEALTH, TOKENS)
  • ContextSnapshots linked to the execution contain the exact chunks sent to the LLM
  • If PostgreSQL is unavailable, the Coach responds normally — tracking never blocks the flow
  • Failed executions have status = 'error' and error_message populated
  • Credits in clients and agents_clients stay automatically synchronized by database triggers
  • ConversationTraces in DynamoDB auto-expire at 90 days (TTL)
  • Cada mensaje del usuario genera un ConversationTrace en DynamoDB y (si esta habilitado) un AgentExecution completo en PostgreSQL
  • AgentExecution contiene al menos un AgentLog de tipo rag_metrics con latencias del pipeline
  • AgentCost refleja exactamente los tipos de cargo usados en esa ejecucion (EMBEDDING, VECTOR_SEARCH, BRAND_HEALTH, TOKENS)
  • Los ContextSnapshot vinculados a la ejecucion contienen los chunks exactos enviados al LLM
  • Si PostgreSQL no esta disponible, el Coach responde normalmente — el tracking nunca bloquea el flujo
  • Las ejecuciones con error tienen status = 'error' y error_message poblado
  • Los creditos en clients y agents_clients se mantienen sincronizados automaticamente por triggers de base de datos
  • Los ConversationTrace en DynamoDB expiran solos a los 90 dias (TTL)

DynamoDB TTL: 90 days · PostgreSQL: optional (ENABLE_AGENT_TRACKING) · Credits: DB triggers, never app code · Charge types: EMBEDDING(1) + VECTOR_SEARCH(1) + BRAND_HEALTH(1) + TOKENS(CEIL(in+out/1000))DynamoDB TTL: 90 dias · PostgreSQL: opcional (ENABLE_AGENT_TRACKING) · Creditos: triggers de BD, nunca codigo de app · Tipos de cargo: EMBEDDING(1) + VECTOR_SEARCH(1) + BRAND_HEALTH(1) + TOKENS(CEIL(in+out/1000))

How It WorksComo Funciona

User sends message
        |
        v
+---------------------------+
|  ConversationLambda       |
|  1. Resolve user          |
|  2. Manage conversation   |
+---------------------------+
        |
        v
+----------------------------------+     +----------------------------+
|  ConversationTrackingOrchestrator |     |  PostgreSQL (AgentTracking)|
|                                  |     |                            |
|  setupTracking()    -----------> +---> |  clients.getOrCreate()     |
|  startExecution()   -----------> +---> |  agent_executions(running) |
+----------------------------------+     +----------------------------+
        |
        v
+---------------------------+
|  RAG Pipeline             |
|                           |
|  Embedding (Vertex AI)    | -----> agent_costs(EMBEDDING, 1cr)
|  Vector search (BigQuery) | -----> agent_costs(VECTOR_SEARCH, 1cr)
|  Brand Health (optional)  | -----> agent_costs(BRAND_HEALTH, 1cr)
|  LLM call (Anthropic)    | -----> agent_costs(TOKENS, CEIL(in+out/1000)cr)
|                           |
|  context_snapshot.save()  | -----> kb_chunks + brand_health snapshots
+---------------------------+
        |
        v
+----------------------------------+     +----------------------------+
|  TrackingOrchestrator            |     |  PostgreSQL Triggers       |
|  completeExecution(done) ------> +---> |  trg_calculate_credits     |
|                                  |     |  trg_apply_credits_client  |
+----------------------------------+     +----------------------------+
        |
        v (parallel — never blocks response)
+---------------------------+
|  DynamoDB                 |
|  ConversationTrace.save() |
|  TTL: 90 days             |
+---------------------------+

All tracking writes are fire-and-forget: the user never waits for them to complete. If PostgreSQL is unavailable, the exception is caught, logged to CloudWatch, and the Coach responds normally. The only synchronous tracking is the ConversationTrace in DynamoDB, which shares the main flow's connection. Credits are calculated by the trg_calculate_agent_credits trigger on INSERT — the application only provides raw data (tokens, charge type). The trigger also syncs totals to agents_clients and clients automatically.Todas las escrituras de tracking son fire-and-forget: el usuario nunca espera a que completen. Si PostgreSQL no esta disponible, la excepcion se captura, se loguea en CloudWatch, y el Coach responde normalmente. El unico tracking sincronico es el ConversationTrace en DynamoDB, que comparte la conexion del flujo principal. Los creditos se calculan por el trigger trg_calculate_agent_credits en el INSERT — la aplicacion solo provee datos crudos (tokens, tipo de cargo). El trigger tambien sincroniza totales a agents_clients y clients automaticamente.

Chronological Write Flow (14 steps per execution)Flujo Cronologico de Escritura (14 pasos por ejecucion)

 1. clients.getOrCreate()              → ensure client exists
 2. agents_clients.getOrCreate()       → recover/create conversation session
 3. agent_executions.save(pending)     → register execution before starting
 4. agent_executions.update(running)   → mark pipeline start
 5. agent_costs.save(EMBEDDING)        → trigger: credits=1, deduct from client
 6. agent_costs.save(VECTOR_SEARCH)    → trigger: credits=1, deduct from client
 7. context_snapshot.save(kb_chunks)   → store chunks that will be used
 8. execution_context_links.save()     → link execution → kb_chunks snapshot
 9. agent_costs.save(BRAND_HEALTH)     → trigger: credits=1 (if applicable)
10. context_snapshot.save(brand_health)→ store brand health metrics
11. execution_context_links.save()     → link execution → brand_health snapshot
12. agent_costs.save(TOKENS)           → trigger: credits=CEIL((in+out)/1000)
13. agent_logs.saveBatch([rag_metrics])→ pipeline timing summary
14. agent_executions.update(done)      → mark end, duration_ms

On error: agent_executions.update(error) + agent_logs.save(agent_error)

Implementation StatusEstado de Implementacion

DONE — RAG One-Shot Pipeline (current)HECHO — Pipeline RAG One-Shot (actual)

ConversationTrace in DynamoDB + AgentTracking in PostgreSQL are implemented and in production. Captures: embedding, vector search, brand health, LLM call, costs and context used. RAG pipeline traced end-to-end. 7 PostgreSQL tables operational. Credit triggers working. Graceful degradation verified.ConversationTrace en DynamoDB + AgentTracking en PostgreSQL estan implementados y en produccion. Captura: embedding, vector search, brand health, llamada LLM, costos y contexto utilizado. Pipeline RAG trazado end-to-end. 7 tablas PostgreSQL operacionales. Triggers de creditos funcionando. Degradacion graceful verificada.

Phase 0.3 — ReAct Loop ExtensionFase 0.3 — Extension Loop ReAct

Each round of the ReAct loop generates an AgentLog of type llm_call. Each tool call generates its own AgentLog of type tool_call with name, params, result, and latency. The AgentExecution accumulates all rounds until the LLM emits end_turn. execution_duration_ms reflects total loop duration, not just the first LLM call.Cada ronda del loop ReAct genera un AgentLog de tipo llm_call. Cada tool call genera su propio AgentLog de tipo tool_call con nombre, parametros, resultado y latencia. El AgentExecution acumula todas las rondas hasta que el LLM emite end_turn. execution_duration_ms refleja la duracion total del loop, no solo la primera llamada LLM.

Phase 1 — HookLifecycle Auto-ObservabilityFase 1 — Auto-Observabilidad HookLifecycle

The HookLifecycle (before_tool → execute → after_tool) emits trace events automatically without each tool implementing its own logging. before_tool → AgentLog(tool_call, starting). after_tool → AgentLog(tool_call, success/failure, latencyMs, result). All tools are observable by design from registration in the ToolRegistry.El HookLifecycle (before_tool → execute → after_tool) emite eventos de traza automaticamente sin que cada tool implemente su propio logging. before_tool → AgentLog(tool_call, starting). after_tool → AgentLog(tool_call, success/failure, latencyMs, result). Todas las tools son observables por diseno desde su registro en el ToolRegistry.

Phase 4+ — Cold Storage in BigQuery (Post-MVP)Fase 4+ — Almacenamiento Frio en BigQuery (Post-MVP)

Migrate executions older than 90 days to BigQuery for long-term SQL analytics: cost per user per month, average latency per model, most-used tools, error rates by operation type.Migrar ejecuciones mayores a 90 dias a BigQuery para analytics SQL de largo plazo: costo por usuario por mes, latencia promedio por modelo, herramientas mas usadas, tasas de error por tipo de operacion.

Risk AnalysisAnalisis de Riesgos

Trace Volume GrowthCrecimiento de Volumen de Trazas

Impact: MediumImpacto: Medio

Mitigation: DynamoDB auto-deletes with TTL. PostgreSQL configurable retention by date. AgentLogs saved in batch to reduce round-trips.Mitigacion: DynamoDB auto-elimina con TTL. PostgreSQL con retencion configurable por fecha. AgentLogs se guardan en batch para reducir round-trips.

Tracking Latency ImpactImpacto de Latencia del Tracking

Impact: LowImpacto: Bajo

Mitigation: All PostgreSQL writes are fire-and-forget — user never perceives them. Only the DynamoDB ConversationTrace is synchronous (shares main flow connection).Mitigacion: Todas las escrituras en PostgreSQL son fire-and-forget — el usuario nunca las percibe. Solo el ConversationTrace en DynamoDB es sincronico (comparte conexion del flujo principal).

Incomplete Trace on Lambda TimeoutTraza Incompleta si Lambda Termina Abruptamente

Impact: LowImpacto: Bajo

Mitigation: Detect agent_executions with status='running' and start_time > 5 minutes ago. These represent unfinished traces and can be marked as error. Partial costs may have been recorded before failure.Mitigacion: Detectar agent_executions con status='running' y start_time > 5 minutos. Representan trazas incompletas y pueden marcarse como error. Costos parciales pueden haberse registrado antes de la falla.

PostgreSQL Cost in Low TrafficCosto de PostgreSQL en Bajo Trafico

Impact: LowImpacto: Bajo

Mitigation: Tracking is optional (ENABLE_AGENT_TRACKING=true). In low-traffic or dev environments, disable without affecting functionality.Mitigacion: El tracking es opcional (ENABLE_AGENT_TRACKING=true). En ambientes de bajo trafico o desarrollo, desactivar sin afectar funcionalidad.

Key DecisionsDecisiones Clave

D1.

DynamoDB for conversational record, PostgreSQL for deep traceability — DynamoDB is in the same flow as the conversation: same table, same latency. PostgreSQL has the relational model needed to cross-query executions, costs, and context in a single query. Each system does what it does best.DynamoDB para registro conversacional, PostgreSQL para trazabilidad profunda — DynamoDB esta en el mismo flujo que la conversacion: misma tabla, misma latencia. PostgreSQL tiene el modelo relacional necesario para cruzar ejecuciones, costos y contexto en una sola consulta. Cada sistema hace lo que hace bien.

D2.

Credits calculated in database, not in application — The trigger guarantees consistency without risk of drift between application logic and stored totals. Changing rates is a table update in charge_types, not a code deploy.Creditos calculados en base de datos, no en la aplicacion — El trigger garantiza consistencia sin riesgo de derivacion entre la logica de la aplicacion y los totales almacenados. Cambiar las tarifas es una actualizacion en la tabla charge_types, no un deploy de codigo.

D3.

Tracking optional by design, not as workaround — The Coach does not depend on tracking to function. This allows enabling/disabling per environment, changing the schema without affecting the main flow, and absorbing PostgreSQL failures without degrading service.Tracking opcional por diseno, no como workaround — El Coach no depende del tracking para funcionar. Esto permite activar/desactivar por ambiente, cambiar el schema sin afectar el flujo principal, y absorber fallas de PostgreSQL sin degradar el servicio.

D4.

ContextSnapshot as source of truth for LLM context — Storing the exact chunks the LLM saw (not just IDs) allows reproducing any response and diagnosing why the LLM had or didn't have certain information. It's the difference between "the system searched 5 chunks" and "these were the 5 chunks".ContextSnapshot como fuente de verdad del contexto LLM — Guardar los chunks exactos que vio el LLM (no solo los IDs) permite reproducir cualquier respuesta y diagnosticar por que el LLM tuvo o no tuvo cierta informacion. Es la diferencia entre "el sistema busco 5 chunks" y "estos fueron los 5 chunks".

MVP Scope

[v3] ~90% operational. ConversationTrace (DynamoDB, TTL 90d) + AgentExecution (PostgreSQL, 7 tables) in production. Full RAG pipeline traced end-to-end. Credit triggers working. Remaining: ReAct loop extension (Phase 0.3), HookLifecycle auto-observability (Phase 1). [v3] ~90% operacional. ConversationTrace (DynamoDB, TTL 90d) + AgentExecution (PostgreSQL, 7 tablas) en produccion. Pipeline RAG trazado end-to-end. Triggers de creditos funcionando. Pendiente: extension loop ReAct (Fase 0.3), auto-observabilidad HookLifecycle (Fase 1).

Built onConstruido sobre

ConversationTrace + AgentTracking — proven in production. PostgreSQL trigger-based credit system. Fire-and-forget tracking pattern. ConversationTrace + AgentTracking — probados en produccion. Sistema de creditos basado en triggers PostgreSQL. Patron de tracking fire-and-forget.

Source:Fuente: ConversationTrace + AgentTracking (production) | Depends on:Depende de: --
📝 Project ChangelogChangelog del Proyecto
v4 Mar 2, 2026
~Badge corrected: ~90% → EXISTS (consistency with summary table)Badge corregido: ~90% → EXISTS (consistencia con tabla resumen)
~Pending deep spec rewrite — spec remains from v3Pendiente de rewrite deep spec — spec se mantiene de v3
v3 Feb 27-28, 2026
~Renamed “Observability & Learning Pipeline” → “Observability & Traceability” (LearningPipeline removed)Renombrado “Observability & Learning Pipeline” → “Observability & Traceability” (LearningPipeline removido)
+Full deep spec card: ConversationTrace (DynamoDB, TTL 90d) + AgentExecution (PostgreSQL, 7 tables)Card deep spec completa: ConversationTrace (DynamoDB, TTL 90d) + AgentExecution (PostgreSQL, 7 tablas)
+Credit calculation via PostgreSQL triggers (not in application)Calculo de creditos via triggers PostgreSQL (no en la aplicacion)
v2 Feb 27, 2026
+TTL of 90 days added to DynamoDB tracesTTL de 90 dias agregado a trazas DynamoDB
+Absorbs before/after tracking from deferred Feedback Loop (#15)Absorbe tracking antes/despues del diferido Feedback Loop (#15)
v1 Feb 26, 2026
+Initial — “Observability & Learning Pipeline”, ADAPT statusInicial — “Observability & Learning Pipeline”, estado ADAPT
📚

Layer 3 — KNOWLEDGECapa 3 — CONOCIMIENTO

What the Coach knowsLo que el Coach sabe

+
#9

Cerebro / Knowledge Base

Knowledge — Mateo

EXISTS

The Coach's long-term memory of eCommerce expertise. 2,875 Markdown documents organized in 11 namespaces, indexed by a 4-stage Go pipeline (validate → chunk → embed → store) into BigQuery via Vertex AI text-embedding-004 (1024 dims). The agent finds relevant knowledge by meaning, not keywords. Two repos, two responsibilities: core-knowledge-semantic-base/ owns the corpus + indexing pipeline; core-intelligence-conversation-api/ owns the search + context injection. The KB is an automatic context — always available in the system prompt via RAG semantic search. The LLM never decides "should I query the KB?" — it's simply there, like conversation history or the seller's profile. Contextual Retrieval: during indexing, each chunk is enriched with a generated summary of its role within the full document before embedding — this dramatically improves search recall (Anthropic reports 49-67% improvement) because the search engine understands what each chunk means in context, not just what it literally says. Operational in production today. La memoria a largo plazo del Coach sobre expertise en eCommerce. 2,875 documentos Markdown organizados en 11 namespaces, indexados por un pipeline Go de 4 etapas (validar → chunk → embed → store) en BigQuery via Vertex AI text-embedding-004 (1024 dims). El agente encuentra conocimiento relevante por significado, no por palabras clave. Dos repos, dos responsabilidades: core-knowledge-semantic-base/ es dueno del corpus + pipeline de indexacion; core-intelligence-conversation-api/ es dueno de la busqueda + inyeccion de contexto. La KB es un contexto automatico — siempre disponible en el system prompt via busqueda semantica RAG. El LLM nunca decide "deberia consultar la KB?" — simplemente esta ahi, como el historial de conversacion o el perfil del vendedor. Contextual Retrieval: durante la indexacion, cada chunk se enriquece con un resumen generado de su rol dentro del documento completo antes de embeber — esto mejora dramaticamente el recall de busqueda (Anthropic reporta mejora del 49-67%) porque el motor de busqueda entiende que significa cada chunk en contexto, no solo lo que literalmente dice. Operacional en produccion hoy.

Current StateEstado Actual

Operational in ProductionOperacional en Produccion

2,875 docs indexed in BigQuery (11 namespaces). Go pipeline complete: validate-kb.go → indexer.go → kb-embedder.go → report-outdated-kb.go. Vertex AI text-embedding-004 (1024 dims). 23-metric catalog with strict front-matter validation. Freshness report via GitHub Actions (weekly). RagOrchestrator in coach-api consuming kb_embeddings. BrandHealthContextService with separate brand_health_embeddings table.2,875 docs indexados en BigQuery (11 namespaces). Pipeline Go completo: validate-kb.go → indexer.go → kb-embedder.go → report-outdated-kb.go. Vertex AI text-embedding-004 (1024 dims). Catalogo de 23 metricas con validacion estricta de front-matter. Reporte de frescura via GitHub Actions (semanal). RagOrchestrator en coach-api consumiendo kb_embeddings. BrandHealthContextService con tabla brand_health_embeddings separada.

Needs ImprovementNecesita Mejora

No namespace filter in RAGVectorSearchService — search is global across all 11 namespaces, chunks from learning compete with rules-as-cards regardless of intent. No re-indexing strategy for edited docs — embedder inserts but doesn't invalidate previous chunks of the same document (no is_current flag).Sin filtro de namespace en RAGVectorSearchService — la busqueda es global en los 11 namespaces, chunks de learning compiten con rules-as-cards sin importar el intent. Sin estrategia de re-indexacion para docs editados — el embedder inserta pero no invalida chunks anteriores del mismo documento (sin flag is_current).

To BuildPor Construir

Namespace skills/ (tool usage documentation). Namespace trends/ (marketplace trends). Namespace filter by intent in RAGVectorSearchService. is_current flag + re-indexing logic. Contextual Retrieval: LLM-generated context summary per chunk at indexing time (Anthropic technique, 49-67% recall improvement). Voyage AI evaluation benchmark. Hybrid search (BM25 + vector) for exact technical terms.Namespace skills/ (documentación de uso de tools). Namespace trends/ (tendencias de marketplace). Filtro de namespace por intent en RAGVectorSearchService. Flag is_current + logica de re-indexacion. Contextual Retrieval: resumen de contexto generado por LLM por chunk en tiempo de indexacion (tecnica Anthropic, mejora de recall 49-67%). Benchmark de evaluacion Voyage AI. Busqueda hibrida (BM25 + vector) para terminos técnicos exactos.

11 Namespaces — 2,875 Documents11 Namespaces — 2,875 Documentos

inventory
743
learning
652
rules-as-actions
622
rules-as-docs
570
ads
433
reputation
376
financial
361
rules-as-cards
324
rules-as-glossary
172
compliance
106
+6 more+6 mas
organic, quality, rules-as-playbooks, pricing, rules-as-metrics, health
Go Pipeline (4 stages)Pipeline Go (4 etapas)
validate → chunk → embed → store
Vertex AI 004
1024 dims, ~$0.02/1M tokens1024 dims, ~$0.02/1M tokens
23-Metric CatalogCatalogo 23 Metricas
NP, ROAS, ACOS, CTR, P-QI...
8 Document Types8 Tipos de Documento
doc, card, metric, action, log, playbook, glossary, health
BigQuery Vectors
COSINE_DISTANCE searchBusqueda COSINE_DISTANCE
Automatic ContextContexto Automatico
Not a tool — always in promptNo es tool — siempre en prompt
Contextual Chunk EnricherEnriquecedor Contextual de Chunks
LLM-generated summary per chunk (indexing time)Resumen generado por LLM por chunk (en indexacion)

Scope Boundaries — What this layer does NOT doLimites de Alcance — Lo que esta capa NO hace

Query embedding at runtime — coach-api uses VertexEmbeddingClient with the same text-embedding-004 model to embed user queries. Two uses of the same model in two repos — query embedding is coach-api's responsibility.Embedding de queries en runtime — coach-api usa VertexEmbeddingClient con el mismo modelo text-embedding-004 para embeber queries del usuario. Dos usos del mismo modelo en dos repos — el embedding de queries es responsabilidad del coach-api.
Brand Health — Seller's 23 metrics with Critique/Delicate/Good/Optimal scores live in brand_health_embeddings (separate BigQuery table). KB has general domain knowledge; Brand Health has the seller's real state. Parallel context sources.Brand Health — Las 23 metricas del vendedor con scores Critique/Delicate/Good/Optimal viven en brand_health_embeddings (tabla BigQuery separada). KB tiene conocimiento general del dominio; Brand Health tiene el estado real del vendedor. Fuentes de contexto paralelas.
ReAct loop logic — RagOrchestrator, RAGLLMService, ConversationFlowOrchestrator are coach-api (#2). KB only provides the corpus and the vector index — it doesn't know how the returned context is used.Logica del loop ReAct — RagOrchestrator, RAGLLMService, ConversationFlowOrchestrator son coach-api (#2). La KB solo provee el corpus y el indice vectorial — no sabe como se usa el contexto que retorna.
Live seller data — KB contains static knowledge (best practices, policies, rules). Live seller data (real-time sales, product state, orders) is the responsibility of READ/ANALYSIS tools (#3) and Brand Health. KB doesn't change based on seller data.Datos en vivo del vendedor — La KB contiene conocimiento estatico (mejores practicas, politicas, reglas). Los datos en vivo del vendedor (ventas en tiempo real, estado del producto, pedidos) son responsabilidad de las tools READ/ANALYSIS (#3) y Brand Health. La KB no cambia en funcion de los datos del vendedor.

Tech Stack (Go + GCP)Stack Tecnologico (Go + GCP)

Go 1.24.0 Markdown + YAML front-matter Vertex AI text-embedding-004 BigQuery (COSINE_DISTANCE) GitHub Actions Git (version control)
Data Models, Interfaces & Acceptance Criteria Modelos de Datos, Interfaces & Criterios de Aceptación
BigQuery Schema (kb_embeddings)Schema BigQuery (kb_embeddings)
-- Dataset: knowledge_base | Table: kb_embeddings
id            STRING    NOT NULL   -- "KB-financial-NP__chunk_0"
document_id   STRING    NOT NULL   -- "KB-financial-NP"
namespace     STRING    NOT NULL   -- "financial"
doc_type      STRING    NOT NULL   -- "metric"
title         STRING               -- "Net Profit (NP)"
chunk_index   INT64               -- 0
source        STRING               -- Relative path to .md
tags          ARRAY<STRING>        -- ["financial", "metric"]
text          STRING               -- Chunk content (≤ 1000 chars)
embedding     ARRAY<FLOAT64>       -- Vertex AI text-embedding-004 (1024 dims)
created_at    TIMESTAMP NOT NULL
updated_at    TIMESTAMP NOT NULL
language      STRING    NOT NULL   -- "es"

-- YAML Front-Matter (required fields):
-- id, namespace, type, title, version (SemVer), last_reviewed (ISO date),
-- language, tags, metric_refs
-- Conditional: condition+severity+action_hint (card), formula+unit+thresholds (metric),
--   source (learning), metric_refs required (health)
Query Flow (coach-api side)Flujo de Consulta (lado coach-api)
// Consumed by RagOrchestrator in core-intelligence-conversation-api
RAGEmbeddingService
  └── VertexEmbeddingClient.embed(userQuery)
        → ARRAY<FLOAT64> [1024 dims, text-embedding-004]

RAGVectorSearchService
  └── BigQuery COSINE_DISTANCE against kb_embeddings
        → top-K chunks (KBChunkModel[])

RAGChunkRankingService.rank(chunks, query) → top-5 reranked
RAGLLMService.generateAnswer(query, chunks, options)
  → injection as <knowledge_base> context in the LLM prompt

// Embedding coherence: coach-api MUST use the same model
// (text-embedding-004) as the KB indexer. Changing the model
// requires re-indexing all 2,875 documents.
Acceptance CriteriaCriterios de Aceptación
  • 2,875 documents indexed and searchable in BigQuery kb_embeddings
  • Go pipeline validates front-matter: required fields, unique IDs, metric_refs against 23-metric catalog
  • Semantic search returns relevant chunks in <500ms via COSINE_DISTANCE
  • Freshness report alerts on docs exceeding namespace thresholds (60/90/180 days)
  • Namespace filter in RAGVectorSearchService scopes search by intent
  • Re-indexing marks old chunks as stale when document is updated (is_current flag)
  • LLM correctly cites KB information in responses (human-verifiable)
  • [Ph 1] Contextual Retrieval: each chunk includes LLM-generated summary of its role in the full document before embedding. Runs once at indexing/re-indexing time, not per query. Search recall improves measurably vs baseline
  • 2,875 documentos indexados y buscables en BigQuery kb_embeddings
  • Pipeline Go valida front-matter: campos requeridos, IDs unicos, metric_refs contra catalogo de 23 metricas
  • Busqueda semantica retorna chunks relevantes en <500ms via COSINE_DISTANCE
  • Reporte de frescura alerta sobre docs que exceden umbrales por namespace (60/90/180 dias)
  • Filtro de namespace en RAGVectorSearchService limita busqueda por intent
  • Re-indexacion marca chunks viejos como stale cuando se actualiza el documento (flag is_current)
  • LLM cita informacion del KB correctamente en respuestas (verificable por humano)
  • [Ph 1] Contextual Retrieval: cada chunk incluye resumen generado por LLM de su rol en el documento completo antes de embeber. Se ejecuta una vez en indexacion/re-indexacion, no por query. El recall de busqueda mejora mediblemente vs baseline

2,875 docs · 11 namespaces · 23 metrics · Go 1.24.0 · Vertex AI 004 (1024 dims) · BigQuery · Automatic context (not a tool)

How It Works — Two Flows, One ContractComo Funciona — Dos Flujos, Un Contrato

  INDEXING FLOW (Go pipeline)              QUERY FLOW (coach-api)
  core-knowledge-semantic-base/        core-intelligence-conversation-api/
  ================================         ================================

  New/edited .md document                  User query
          |                                        |
          v                                        v
  +---------------------------+            VertexEmbeddingClient.embed(query)
  | 1. VALIDATE               |            → ARRAY<FLOAT64> [1024 dims]
  |   validate-kb.go          |                    |
  |   - Front-matter complete |                    v
  |   - ID unique (KB-ns-slug)|            BigQuery COSINE_DISTANCE
  |   - metric_refs valid     |            → kb_embeddings (top-K)
  |   - Word count ≤ 1500    |                    |
  +---------------------------+                    v
          |                                RAGChunkRankingService
          v                                → top-5 reranked chunks
  +---------------------------+                    |
  | 2. CHUNK + JSONL          |                    v
  |   indexer.go              |            Inject as <knowledge_base>
  |   - Split by paragraphs   |            context in LLM prompt
  |   - Limit: 1000 chars     |            (automatic — not a tool)
  |   - Output: JSONL          |
  +---------------------------+
          |
          v
  +---------------------------+
  | 2b. CONTEXTUAL ENRICH     |      Contextual Retrieval
  |   contextual-enricher.go  |      (Anthropic technique)
  |   - LLM summarizes chunk  |      +49-67% recall improvement
  |     role in full document  |
  |   - Prepends context to   |
  |     chunk before embedding |
  +---------------------------+
          |
          v
  +---------------------------+
  | 3. EMBED                  |      Contract: both sides use
  |   kb-embedder.go          |      text-embedding-004 (1024 dims)
  |   - Vertex AI 004         |      Changing model = re-index all
  |   - Batch 100 chunks      |
  |   - Insert BigQuery       |
  +---------------------------+
          |
          v
  +---------------------------+
  | 4. FRESHNESS REPORT       |
  |   report-outdated-kb.go   |      Thresholds:
  |   - GitHub Actions weekly |      health: 60d | learning: 180d
  |   - Alert stale docs      |      all others: 90d
  +---------------------------+
            

The KB operates as two completely separate flows connected by a BigQuery contract. The Indexing Flow (Go pipeline in core-knowledge-semantic-base/) takes Markdown documents with YAML front-matter, validates structure (required fields, unique IDs, metric_refs against the 23-metric catalog, word count ≤1500), chunks by paragraphs (1000 char limit, no overlap), enriches each chunk with Contextual Retrieval — an LLM generates a summary of the chunk's role within the full document and prepends it to the chunk before embedding (e.g., a chunk saying "processing time is 1-3 business days" gets context that it belongs to the logistics section and describes the period between sale confirmation and dispatch), generates embeddings via Vertex AI text-embedding-004 in batches of 100, and inserts into BigQuery kb_embeddings. This contextual enrichment runs once at indexing time, not per query — Anthropic reports 49-67% improvement in retrieval recall with this technique. A weekly GitHub Actions job reports stale documents with differentiated thresholds (health 60d, learning 180d, others 90d). The Query Flow (in core-intelligence-conversation-api/) embeds the user's query with the same text-embedding-004 model, runs cosine similarity search in BigQuery, re-ranks the top-K chunks, and injects them into the LLM prompt as automatic context. The LLM never invokes the KB as a tool — it's always available in the system prompt.La KB opera como dos flujos completamente separados conectados por un contrato de BigQuery. El Flujo de Indexacion (pipeline Go en core-knowledge-semantic-base/) toma documentos Markdown con front-matter YAML, valida estructura (campos requeridos, IDs unicos, metric_refs contra catalogo de 23 metricas, word count ≤1500), divide por parrafos (limite 1000 chars, sin overlap), enriquece cada chunk con Contextual Retrieval — un LLM genera un resumen del rol del chunk dentro del documento completo y lo antepone al chunk antes de embeber (ej., un chunk que dice "el tiempo de gestion es de 1 a 3 dias habiles" recibe contexto de que pertenece a la seccion de logistica y describe el periodo entre confirmacion de venta y despacho), genera embeddings via Vertex AI text-embedding-004 en batches de 100, e inserta en BigQuery kb_embeddings. Este enriquecimiento contextual se ejecuta una vez en tiempo de indexacion, no por query — Anthropic reporta mejora del 49-67% en recall de recuperacion con esta técnica. Un job semanal de GitHub Actions reporta documentos vencidos con umbrales diferenciados (health 60d, learning 180d, otros 90d). El Flujo de Consulta (en core-intelligence-conversation-api/) embebe la query del usuario con el mismo modelo text-embedding-004, ejecuta busqueda de similitud coseno en BigQuery, re-rankea los top-K chunks, y los inyecta en el prompt del LLM como contexto automatico. El LLM nunca invoca la KB como tool — siempre esta disponible en el system prompt.

Implementation Plan (improvements over existing system)Plan de Implementacion (mejoras sobre sistema existente)

Phase 1: Namespace Filtering + Re-indexing (Week 3-4)Fase 1: Filtrado por Namespace + Re-indexacion (Semana 3-4)

Add namespace filter to RAGVectorSearchService — scope BigQuery search by namespace when the user's intent is clear (e.g., ads question → filter ads + learning). Add is_current flag to kb_embeddings schema. Implement re-indexing logic: when a document is edited, mark old chunks is_current=false, index new chunks. Update query to filter is_current=true. Contextual Retrieval: add LLM-generated context enrichment to the indexing pipeline — before embedding, each chunk receives a summary of its purpose within the full document (e.g., "this chunk belongs to the logistics section and explains the processing time between sale confirmation and dispatch"). The enriched chunk+context is what gets embedded. Runs once at index time per chunk, uses a lightweight LLM call. Anthropic reports 49-67% improvement in retrieval recall with this technique.Agregar filtro de namespace a RAGVectorSearchService — limitar busqueda BigQuery por namespace cuando el intent del usuario es claro (ej., pregunta de publicidad → filtrar ads + learning). Agregar flag is_current al schema de kb_embeddings. Implementar logica de re-indexacion: al editar un documento, marcar chunks viejos is_current=false, indexar nuevos chunks. Actualizar query para filtrar is_current=true. Contextual Retrieval: agregar enriquecimiento de contexto generado por LLM al pipeline de indexacion — antes de embeber, cada chunk recibe un resumen de su proposito dentro del documento completo (ej., "este chunk pertenece a la seccion de logistica y explica el tiempo de gestion entre confirmacion de venta y despacho"). El chunk enriquecido+contexto es lo que se embebe. Se ejecuta una vez por chunk en tiempo de indexacion, usa una llamada LLM ligera. Anthropic reporta mejora del 49-67% en recall de recuperacion con esta técnica.

Phase 2: New Namespaces + Content (Week 4-5)Fase 2: Nuevos Namespaces + Contenido (Semana 4-5)

Create skills/ namespace — documentation of how the agent should use and interpret each tool from Tool Registry (#3). Create trends/ namespace — marketplace trends, seasonal patterns, policy changes. Expose KB search as automatic context provider for Context Aggregator (#5) — KB is always available in the user prompt via RAG top-K semantic search. Validate LLM correctly cites KB information in responses.Crear namespace skills/ — documentación de como el agente debe usar e interpretar cada tool del Tool Registry (#3). Crear namespace trends/ — tendencias de marketplace, patrones estacionales, cambios de politica. Exponer busqueda de KB como proveedor de contexto automatico para Context Aggregator (#5) — KB siempre disponible en el user prompt via busqueda semantica RAG top-K. Validar que el LLM cite correctamente informacion del KB en respuestas.

Phase 3: Search Quality (post-MVP)Fase 3: Calidad de Busqueda (post-MVP)

Evaluate Voyage AI vs text-embedding-004 with a real benchmark suite of eCommerce domain queries. Add doc_type filter to search — allow filtering by card, metric, playbook when the intent justifies it. Evaluate hybrid search (BM25 + vector) for improved recall on exact technical terms (e.g., "ACOS", metric slugs). Consider Cohere Rerank as dedicated re-ranker if heuristic ranking is insufficient.Evaluar Voyage AI vs text-embedding-004 con suite de benchmark real de queries del dominio eCommerce. Agregar filtro de doc_type a la busqueda — permitir filtrar por card, metric, playbook cuando el intent lo justifica. Evaluar busqueda hibrida (BM25 + vector) para mejorar recall en terminos técnicos exactos (ej., "ACOS", slugs de metricas). Considerar Cohere Rerank como re-ranker dedicado si el ranking heuristico es insuficiente.

Risk AnalysisAnalisis de Riesgos

Stale KnowledgeConocimiento Desactualizado

Impact: High — outdated marketplace policies can harm the seller.Impacto: Alto — politicas de marketplace desactualizadas pueden perjudicar al vendedor.

Mitigation: report-outdated-kb.go runs weekly via GitHub Actions. Differentiated thresholds: health 60d, learning 180d, others 90d. SemVer versioning + last_reviewed field in every document.Mitigacion: report-outdated-kb.go corre semanalmente via GitHub Actions. Umbrales diferenciados: health 60d, learning 180d, otros 90d. Versionado SemVer + campo last_reviewed en cada documento.

Irrelevant Chunks Contaminate ContextChunks Irrelevantes Contaminan Contexto

Impact: Medium — semantic false positives degrade LLM reasoning quality.Impacto: Medio — falsos positivos semanticos degradan la calidad del razonamiento del LLM.

Mitigation: RAGChunkRankingService re-ranks before injection. Namespace filtering reduces search space. Future: dedicated re-ranker (Cohere Rerank) if heuristic is insufficient.Mitigacion: RAGChunkRankingService re-rankea antes de inyeccion. Filtrado por namespace reduce espacio de busqueda. Futuro: re-ranker dedicado (Cohere Rerank) si la heuristica es insuficiente.

Embedding Model DesynchronizationDesincronizacion del Modelo de Embedding

Impact: High — changing embedding model without re-indexing returns degraded results.Impacto: Alto — cambiar modelo de embedding sin re-indexar retorna resultados degradados.

Mitigation: Both repos MUST use text-embedding-004. Any model change requires re-indexing all 2,875 documents before deployment. Pipeline already supports full re-index.Mitigacion: Ambos repos DEBEN usar text-embedding-004. Cualquier cambio de modelo requiere re-indexar los 2,875 documentos antes del deployment. El pipeline ya soporta re-indexacion completa.

BigQuery Scalability at High VolumeEscalabilidad de BigQuery a Volumen Alto

Impact: Low at MVP — ~30K chunks estimated, well under 500ms.Impacto: Bajo en MVP — ~30K chunks estimados, muy por debajo de 500ms.

Mitigation: If corpus grows to >100K chunks, evaluate BigQuery VECTOR_SEARCH with ANN index or migrate to dedicated vector store (Pinecone, Weaviate). Migration is transparent if IVectorSearchRepository interface is maintained.Mitigacion: Si el corpus crece a >100K chunks, evaluar BigQuery VECTOR_SEARCH con indice ANN o migrar a vector store dedicado (Pinecone, Weaviate). Migracion transparente si se mantiene la interfaz IVectorSearchRepository.

Key DecisionsDecisiones Clave

D1.

Markdown + YAML in Git, not a CMS — Documents are .md files with YAML front-matter in a Git repo. Versioning is natural (diff per PR), team contribution has zero CMS overhead, and the indexing pipeline triggers on push. Adding or editing knowledge is a PR, not a database operation.Markdown + YAML en Git, no un CMS — Los documentos son archivos .md con front-matter YAML en un repo Git. El versionado es natural (diff por PR), la contribucion del equipo tiene cero overhead de CMS, y el pipeline de indexacion se dispara al hacer push. Agregar o editar conocimiento es un PR, no una operacion de base de datos.

D2.

Vertex AI text-embedding-004, not OpenAI — The project runs on Google Cloud (BigQuery, Vertex AI, GCP auth). Keeping the stack on a single provider reduces authentication complexity and egress costs. The product guide mentioned OpenAI text-embedding-3-small, but the real implementation uses text-embedding-004 (1024 dims). Evaluate Voyage AI post-MVP if relevance evidence warrants it.Vertex AI text-embedding-004, no OpenAI — El proyecto corre en Google Cloud (BigQuery, Vertex AI, GCP auth). Mantener el stack en un unico proveedor reduce complejidad de autenticacion y costos de egress. La guia de producto mencionaba OpenAI text-embedding-3-small, pero la implementacion real usa text-embedding-004 (1024 dims). Evaluar Voyage AI post-MVP si la evidencia de relevancia lo justifica.

D3.

BigQuery as vector store, not Pinecone/Weaviate — BigQuery already handles business analytics data. Adding vector search via COSINE_DISTANCE avoids a new service dependency. At MVP scale (~30K chunks), performance is more than sufficient. Migrate only if latency scales beyond >100K chunks.BigQuery como vector store, no Pinecone/Weaviate — BigQuery ya maneja datos de analytics del negocio. Agregar busqueda vectorial via COSINE_DISTANCE evita una nueva dependencia de servicio. A escala MVP (~30K chunks), el rendimiento es mas que suficiente. Migrar solo si la latencia escala mas alla de >100K chunks.

D4.

KB is automatic context, not a tool — Making it a tool would require the LLM to decide when to query the KB, add a round to the loop, and complicate the system prompt. As automatic context, the KB enriches every turn at zero round cost. The LLM doesn't need to ask for it — it's always there.La KB es contexto automatico, no una tool — Hacerla tool requeriria que el LLM decida cuando consultar la KB, sumaria un round al loop, y complicaria el system prompt. Como contexto automatico, la KB enriquece cada turno a costo cero de rounds. El LLM no necesita pedirla — siempre esta ahi.

D5.

11 namespaces as semantic segmentation — Namespace granularity allows filtering search by intent. An ads question searches ads + learning, not compliance or returns_claims. The index is better leveraged with namespace filters than with global search.11 namespaces como segmentacion semantica — La granularidad de namespaces permite filtrar busqueda por intent. Una pregunta de publicidad busca en ads + learning, no en compliance ni returns_claims. El indice se aprovecha mejor con filtros de namespace que con busqueda global.

MVP Scope

Existing 2,875 docs + Go pipeline operational. MVP adds: namespace filter in search, is_current re-indexing, skills/ namespace (tool documentation), trends/ namespace. Post-MVP: Voyage AI evaluation, hybrid search, doc_type filtering. 2,875 docs existentes + pipeline Go operacional. MVP agrega: filtro de namespace en busqueda, re-indexacion is_current, namespace skills/ (documentación de tools), namespace trends/. Post-MVP: evaluacion Voyage AI, busqueda hibrida, filtrado doc_type.

Inspired byInspirado en

Direct reuse from core-knowledge-semantic-base. Production-proven Go pipeline + BigQuery vector search. Reuso directo de core-knowledge-semantic-base. Pipeline Go probado en produccion + busqueda vectorial BigQuery.

Source:Fuente: core-knowledge-semantic-base (Go pipeline) + core-intelligence-conversation-api (RAG search) | Depends on:Depende de: None (KB is independent — other projects depend on it)Ninguna (KB es independiente — otros proyectos dependen de ella)
📝 Project ChangelogChangelog del Proyecto
v4 Mar 2, 2026
~Target: ~40-50 docs → 2,875 documents across 11 namespacesObjetivo: ~40-50 docs → 2,875 documentos en 11 namespaces
~Indexing pipeline: Python → Go 1.24Pipeline de indexacion: Python → Go 1.24
~Embeddings: Voyage AI / OpenAI text-embedding-3-small (1536d) → Vertex AI text-embedding-004 (1024d)Embeddings: Voyage AI / OpenAI text-embedding-3-small (1536d) → Vertex AI text-embedding-004 (1024d)
+23-metric catalog for KB quality trackingCatalogo de 23 metricas para tracking de calidad de KB
+Contextual Retrieval: each chunk enriched with LLM summary before embedding (+49-67% recall)Contextual Retrieval: cada chunk enriquecido con resumen LLM antes de embeber (+49-67% recall)
+Current State section: Operational / Needs Improvement / To BuildSeccion Estado Actual: Operacional / Necesita Mejora / Por Construir
~KB is context (auto-injected), NOT a tool — clarified throughout specKB es contexto (auto-inyectado), NO un tool — clarificado en toda la spec
v3 Feb 27-28, 2026
+Full deep spec card: ~40-50 docs target, Python pipeline, evaluating Voyage AI vs OpenAI embeddingsCard deep spec completa: ~40-50 docs objetivo, pipeline Python, evaluando Voyage AI vs OpenAI embeddings
v2 Feb 27, 2026
+Evaluate Voyage AI as alternative to OpenAI text-embedding-3-smallEvaluar Voyage AI como alternativa a OpenAI text-embedding-3-small
v1 Feb 26, 2026
+Initial — “Cerebro / Knowledge Base” (v1 #3), EXISTS statusInicial — “Cerebro / Knowledge Base” (v1 #3), estado EXISTS
#10

Data Sync

Data — Andres

ADAPT+NEW

TWO-PIPELINE DATA SYSTEM. The existing batch pipeline ("Complete Data") is preserved and extended — it remains the full historical data source per user, feeding Silver and Gold layers. Its extraction periodicity is configurable (default: 1h active products, 6h all). A NEW "Fast Data" layer exposes on-demand reads via FastAPI directly from Parquet files in GCS — no Redis, no intermediate cache. Fast Data serves the 11 tools defined in the #3 Tool Registry contract (10 READ + 1 ANALYSIS). Every write-tool execution requires a pre-read snapshot captured to GCS before changes. Both pipelines integrate with Open Metadata for lineage and data dictionaries. Both feed Cerebro KB (#9) via embedding sub-pipelines. Gold layer produces the "Brand Health" report — calculation rules migrate from a legacy project (TBD). Auth token management delegates to Marketplace Provider (#12). Infrastructure stays on GCP, managed as IaC in "#14 DevOps (IaC)". SISTEMA DE DATOS CON DOS PIPELINES. El pipeline batch existente ("Datos Completos") se conserva y extiende — sigue siendo la fuente de datos historica completa por usuario, alimentando capas Silver y Gold. Su periodicidad de extraccion es configurable (por defecto: 1h para productos activos, 6h para todos). Una NUEVA capa de "Datos Rapidos" expone lecturas on-demand via FastAPI directamente desde archivos Parquet en GCS — sin Redis, sin cache intermedia. Datos Rapidos sirve las 11 tools definidas en el contrato del Tool Registry (#3) (10 READ + 1 ANALYSIS). Cada ejecucion de tool de escritura requiere un snapshot pre-lectura capturado en GCS antes de los cambios. Ambos pipelines se integran con Open Metadata para linaje y diccionarios de datos. Ambos alimentan Cerebro KB (#9) via sub-pipelines de embeddings. La capa Gold produce el reporte "Brand Health" — las reglas de calculo se migran de un proyecto legacy (TBD). La gestion de tokens de auth se delega al Marketplace Provider (#12). La infraestructura se mantiene en GCP, gestionada como IaC en "#14 DevOps (IaC)".

Beautonomous governance: Fast Data pre-reads support Core's ConfirmationFlow preview step — before any WRITE executes, the current marketplace state is captured so the seller sees exactly what will change. This makes every confirmation dialog accurate and trustworthy.Governance de Beautonomous: las pre-lecturas de Datos Rápidos apoyan el paso de preview del ConfirmationFlow de Core — antes de ejecutar cualquier WRITE, el estado actual del marketplace se captura para que el vendedor vea exactamente qué cambiará. Esto hace que cada diálogo de confirmación sea preciso y confiable.

Two Pipelines ArchitectureArquitectura de Dos Pipelines

Fast Data (NEW) — read layer onlyDatos Rapidos (NUEVO) — solo capa de lectura

  • FastAPI reads directly from Parquet files in GCS — no Redis, no intermediate cacheFastAPI lee directamente desde archivos Parquet en GCS — sin Redis, sin cache intermedia
  • Contract: 10 READ + 1 ANALYSIS tools defined in #3 Tool RegistryContrato: 10 tools READ + 1 ANALYSIS definidas en #3 Tool Registry
  • Pre-write snapshot: captures current Parquet state to GCS before any write-tool executesSnapshot pre-escritura: captura estado Parquet actual en GCS antes de cada tool de escritura
  • Reads Bronze+Silver layers; snapshots stored under bronze/snapshots/Lee capas Bronze+Silver; snapshots almacenados en bronze/snapshots/
  • Bronze embeddings → Cerebro KB (#9) → Context Aggregator (#5) → Orchestrator (#2)Embeddings Bronze → Cerebro KB (#9) → Context Aggregator (#5) → Orquestador (#2)

Complete Data (EXISTS — preserved)Datos Completos (EXISTE — se conserva)

  • Batch pipeline per marketplace (Airflow DAGs, configurable — default 1h active, 6h all)Pipeline batch por marketplace (Airflow DAGs, configurable — por defecto 1h activos, 6h todos)
  • Full historical data — required for Silver and Gold layersDatos historicos completos — necesarios para capas Silver y Gold
  • Gold layer: Brand Health report (rules from legacy, TBD)Capa Gold: reporte Brand Health (reglas de legacy, TBD)
  • Persistent data (no TTL)Datos persistentes (sin TTL)
  • Gold embeddings → Cerebro KB (#9) → Context Aggregator (#5) → Orchestrator (#2)Embeddings Gold → Cerebro KB (#9) → Context Aggregator (#5) → Orquestador (#2)
Open Metadata
EXISTS (MeLi partial, Shopify). Extend for Amazon + Fast DataEXISTE (MeLi parcial, Shopify). Extender para Amazon + Datos Rapidos
Embedding Sub-pipelinesSub-pipelines de Embeddings
Both pipelines → Cerebro KB (#9) → #5 → #2Ambos pipelines → Cerebro KB (#9) → #5 → #2
Auth via #12Auth via #12
Fast Data aligned with Tool Registry (#3), not #12 directlyDatos Rapidos alineados con Tool Registry (#3), no #12 directamente
Airflow DAGs
Complete Data pipeline (batch)Pipeline Datos Completos (batch)
Fast Data ServiceServicio Datos Rapidos
FastAPI reads Parquet from GCS (no Redis)FastAPI lee Parquet desde GCS (sin Redis)
GCS Data Lake
Medallion: Bronze / Silver / GoldMedallion: Bronze / Silver / Gold
Brand Health (Gold)
Legacy rules migration (TBD)Migracion reglas legacy (TBD)
Embedding Pipeline
Bronze → KB (#9) + Gold → KB (#9)Bronze → KB (#9) + Gold → KB (#9)
Data API
FastAPI — serves all 3 layersFastAPI — sirve las 3 capas

Tech Stack (GCP — IaC via #14 DevOps (IaC))Stack Tecnologico (GCP — IaC via #14 DevOps (IaC))

Apache Airflow 2.8+ GCS (Cloud Storage) Parquet + Apache Arrow FastAPI + Cloud Run pyarrow (Parquet reads) Open Metadata Terraform OpenAI Embeddings
Data Models, API Signatures & Acceptance Criteria Modelos de Datos, APIs & Criterios de Aceptación
Data ModelsModelos de Datos
# ═══ COMPLETE DATA PIPELINE (Medallion — GCS) ═══

# BRONZE — Raw marketplace responses (per user, per marketplace)
# gs://shopilot-data/bronze/{marketplace}/{user_id}/{date}.parquet
# Retention: persistent (Coldline after 90d)

# SILVER — Normalized + validated (unified schemas)
products.parquet:
├── user_id, product_id, marketplace, title, price, stock
├── status, visits_30d, sales_30d, conversion_30d
├── health_score: float (0-100), last_updated, synced_at

# GOLD — Pre-computed aggregates + Brand Health
daily_summary.parquet:
├── user_id, date, marketplace
├── total/active/paused_products, total_orders, total_revenue
├── total_visits, avg_conversion_rate, out_of_stock_count
├── new_questions_count, competitor_price_changes

brand_health.parquet:    # Rules from legacy (TBD)
├── user_id, marketplace, calculated_at
├── overall_score: float (0-100)
├── dimension_scores: { products, pricing, stock, reputation, ... }
├── alerts: [{ type, severity, message }]

# ═══ FAST DATA LAYER (FastAPI reads from Parquet — no Redis) ═══

# FastAPI reads directly from GCS Parquet files via pyarrow.
# No intermediate cache. Data freshness = last Complete Data sync cycle.
# Pre-write snapshot: gs://shopilot-data/bronze/snapshots/{tool}/{user_id}/{ts}.parquet

# Tool contract (#3 Tool Registry):
#   READ  : get_product, get_product_metrics, get_orders,
#            get_buyer_questions, get_product_reviews,
#            get_category_requirements, get_campaigns,
#            get_campaign_metrics, get_store, get_store_metrics
#   ANALYSIS: get_product_fee_estimate

# Data Lake Structure (GCS):
# gs://shopilot-data/
# ├── bronze/          Raw (batch) + pre-write snapshots
# │   └── snapshots/   Pre-write state captures (one Parquet per tool+ts)
# ├── silver/          Normalized unified
# ├── gold/            Brand Health + aggregations
# └── embeddings/      Generated for Cerebro KB (#9)
API SignaturesFirmas de API
# Data API (FastAPI on Cloud Run) — serves ALL layers via pyarrow Parquet reads:

# Fast Data (reads from Bronze/Silver Parquet — contract per #3 Tool Registry):
# GET /data/{user_id}/fast/get_product?sku=X&marketplace=Y
# GET /data/{user_id}/fast/get_product_metrics?sku=X&marketplace=Y
# GET /data/{user_id}/fast/get_orders?from=DATE&to=DATE
# GET /data/{user_id}/fast/get_buyer_questions?sku=X
# GET /data/{user_id}/fast/get_product_reviews?sku=X
# GET /data/{user_id}/fast/get_category_requirements?category_id=X
# GET /data/{user_id}/fast/get_campaigns?marketplace=Y
# GET /data/{user_id}/fast/get_campaign_metrics?campaign_id=X
# GET /data/{user_id}/fast/get_store?marketplace=Y
# GET /data/{user_id}/fast/get_store_metrics?marketplace=Y
# GET /data/{user_id}/fast/get_product_fee_estimate?sku=X&price=N  (ANALYSIS)
# GET /data/{user_id}/snapshot/{tool}/{ts}   -> pre-write snapshot from GCS

# Silver layer (normalized):
# GET /data/{user_id}/products?status=active&marketplace=X
# GET /data/{user_id}/orders?from=DATE&to=DATE

# Gold layer (aggregated):
# GET /data/{user_id}/summary?days=30
# GET /data/{user_id}/brand-health                     -> Brand Health report
# GET /data/{user_id}/metrics/daily
# GET /data/{user_id}/anomalies

# Embedding triggers:
# POST /data/{user_id}/embed/fast    -> Bronze fast data -> Cerebro KB
# POST /data/{user_id}/embed/health  -> Gold Brand Health -> Cerebro KB

# Airflow DAGs (Complete Data — configurable period):
# meli_sync:         configurable (default: 1h active, 6h all)
# amazon_sync:       configurable (default: 1h active, 6h all)
# shopify_sync:      configurable (default: 1h active, 6h all)
# brand_health:      configurable (default: 1h post-sync)
# embedding_sync:    configurable (default: 1h post brand_health)
# snapshot_cleanup:  daily (removes pre-write snapshots older than 24h)
# openmetadata_sync: configurable (default: 6h, lineage + dictionaries)
Acceptance CriteriaCriterios de Aceptación
  • [Complete] MeLi + Amazon + Shopify DAGs run every hour without errors for 50 users
  • [Complete] Brand Health Gold report computes correctly (legacy rules migrated)
  • [Fast] On-demand read for any Marketplace Provider Tool responds in <500ms
  • [Fast] Pre-write snapshot cached before every write Tool execution
  • [Fast] TTL cleanup removes expired fast data without manual intervention
  • [Both] Open Metadata lineage and data dictionaries generated for both pipelines
  • [Both] Embedding sub-pipelines feed Cerebro KB (#9): Bronze fast → real-time, Gold → Brand Health
  • [API] Data API serves Bronze, Silver, and Gold layers. <200ms p95 for Gold queries
  • [Auth] Token management via Marketplace Provider (#12) — local Auth Vault scheme replaced
  • [Completo] DAGs MeLi + Amazon + Shopify corren cada hora sin errores para 50 usuarios
  • [Completo] Reporte Brand Health Gold se calcula correctamente (reglas legacy migradas)
  • [Rapido] Lectura on-demand para cualquier Tool del Marketplace Provider responde en <500ms
  • [Rapido] Snapshot pre-escritura cacheado antes de cada ejecucion de Tool de escritura
  • [Rapido] Limpieza TTL elimina datos rapidos expirados sin intervencion manual
  • [Ambos] Linaje y diccionarios Open Metadata generados para ambos pipelines
  • [Ambos] Sub-pipelines de embeddings alimentan Cerebro KB (#9): Bronze rapido → tiempo real, Gold → Brand Health
  • [API] Data API sirve capas Bronze, Silver y Gold. <200ms p95 para queries Gold
  • [Auth] Gestion de tokens via Marketplace Provider (#12) — esquema local Auth Vault reemplazado

Complete: 1h sync · Fast: <500ms on-demand, TTL cleanup · API: all 3 layers, <200ms Gold · Embeddings: Bronze+Gold → KB

How It Works — Two PipelinesComo Funciona — Dos Pipelines

╔══════════════════════════════════════════════════════════════════╗
║  FAST DATA LAYER (read-only, on-demand via FastAPI)              ║
╠══════════════════════════════════════════════════════════════════╣
║  No Redis. No intermediate cache. FastAPI reads Parquet (GCS).   ║
║  Tools: #3 Tool Registry contract (10 READ + 1 ANALYSIS)         ║
╚══════════════════════════════════════════════════════════════════╝

  Tool Registry (#3) READ/ANALYSIS tool call
         |
         v
  FastAPI Data API ──> pyarrow reads GCS Parquet (Bronze/Silver)
         |
         +──> Returns data directly to caller (<500ms)
         |
         |  [write-tool path: pre-read before execution]
         +──> Snapshot to GCS bronze/snapshots/{tool}/{user}/{ts}.parquet
         |       +── snapshot_cleanup DAG: daily (24h retention)
         |
         +──> Open Metadata (lineage + dictionary)
         |
         +──> Embedding Pipeline ──> Cerebro KB (#9)
                 #9 ──> Context Aggregator (#5) ──> Orchestrator (#2)

╔══════════════════════════════════════════════════════════════════╗
║  PIPELINE 2: COMPLETE DATA (batch, persistent) — EXISTS          ║
╚══════════════════════════════════════════════════════════════════╝

  Apache Airflow DAGs (1h active, 6h all)
         |
         +── Auth via Marketplace Provider (#12)
         +── Extract ──> Transform ──> Load
         |
         v
  DATA LAKE (GCS — Medallion)
  +── bronze/       Raw marketplace responses (persistent)
  +── silver/       Normalized unified schemas
  +── gold/         Brand Health + aggregations
  |                   +── brand_health.parquet (legacy rules TBD)
  |                   +── daily_summary.parquet
  |                   +── competitor_prices.parquet
  |
  +──> Open Metadata (lineage + dictionary)
  |
  +──> Embedding Pipeline ──> Cerebro KB (#9)
          #9 ──> Context Aggregator (#5) ──> Orchestrator (#2)

╔══════════════════════════════════════════════════════════════════╗
║  DATA API (FastAPI on Cloud Run) — serves ALL layers             ║
╠══════════════════════════════════════════════════════════════════╣
║  Bronze: /data/{user}/fast/{tool}     (near real-time)           ║
║  Silver: /data/{user}/products        (normalized)               ║
║  Gold:   /data/{user}/brand-health    (aggregated)               ║
╚══════════════════════════════════════════════════════════════════╝

Data Sync has two complementary components. The Fast Data layer is a FastAPI service that reads Parquet files directly from GCS via pyarrow — no Redis, no intermediate cache. It exposes the 11 tools defined in the #3 Tool Registry contract (10 READ + 1 ANALYSIS). Before any write-tool executes, a snapshot of the current Parquet state is captured to GCS (bronze/snapshots/) for audit and rollback. The Complete Data pipeline (existing) runs Airflow DAGs with configurable periodicity (default 1h active, 6h all), feeding the full Medallion architecture. Gold layer produces the Brand Health report (rules migrating from legacy). Both components register lineage and data dictionaries in Open Metadata. Both generate embeddings: Bronze fast data feeds the Orchestrator with near-real-time context, Gold Brand Health feeds it with deep analytical context. Auth token management via Marketplace Provider (#12).Data Sync tiene dos componentes complementarios. La capa de Datos Rapidos es un servicio FastAPI que lee archivos Parquet directamente desde GCS via pyarrow — sin Redis, sin cache intermedia. Expone las 11 tools definidas en el contrato del Tool Registry (#3) (10 READ + 1 ANALYSIS). Antes de que cualquier tool de escritura se ejecute, se captura un snapshot del estado Parquet actual en GCS (bronze/snapshots/) para auditoria y rollback. El pipeline de Datos Completos (existente) ejecuta DAGs Airflow con periodicidad configurable (por defecto 1h activos, 6h todos), alimentando la arquitectura Medallion completa. La capa Gold produce el reporte Brand Health (reglas migrando de legacy). Ambos componentes registran linaje y diccionarios de datos en Open Metadata. Ambos generan embeddings: datos rapidos Bronze alimentan al Orquestador con contexto casi-en-tiempo-real, Brand Health Gold lo alimenta con contexto analitico profundo. Gestion de tokens via Marketplace Provider (#12).

Implementation PlanPlan de Implementacion

Phase 1: Fast Data Pipeline + Tool Alignment (Week 1-2)Fase 1: Pipeline Datos Rapidos + Alineacion con Tools (Semana 1-2)

Build the FastAPI Data API: exposes the 11 tools defined in #3 Tool Registry contract via direct pyarrow reads from GCS Parquet. No Redis. READ tools: get_product, get_product_metrics, get_orders, get_buyer_questions, get_product_reviews, get_category_requirements, get_campaigns, get_campaign_metrics, get_store, get_store_metrics. ANALYSIS: get_product_fee_estimate. Pre-write snapshot: write current Parquet state to bronze/snapshots/ before every write-tool. snapshot_cleanup DAG (daily, 24h retention). Auth token management via Marketplace Provider (#12).Construir FastAPI Data API: expone las 11 tools definidas en el contrato del Tool Registry (#3) via lecturas directas con pyarrow desde Parquet en GCS. Sin Redis. Tools READ: get_product, get_product_metrics, get_orders, get_buyer_questions, get_product_reviews, get_category_requirements, get_campaigns, get_campaign_metrics, get_store, get_store_metrics. ANALYSIS: get_product_fee_estimate. Snapshot pre-escritura: escribe estado Parquet actual en bronze/snapshots/ antes de cada tool de escritura. DAG snapshot_cleanup (diario, retencion 24h). Gestion de tokens via Marketplace Provider (#12).

Phase 2: Verify + Adapt Complete Data DAGs (Week 2-3)Fase 2: Verificar + Adaptar DAGs de Datos Completos (Semana 2-3)

Verify existing MeLi DAGs still work in production, adapt from daily batch to hourly incremental. Add Amazon SP-API + Shopify GraphQL DAGs following same pipeline structure. Migrate Auth Vault token resolution to Marketplace Provider scheme. Validate Bronze layer Parquet output for all 3 marketplaces.Verificar que DAGs existentes de MeLi aun funcionan en produccion, adaptar de batch diario a incremental cada hora. Agregar DAGs de Amazon SP-API + Shopify GraphQL siguiendo misma estructura de pipeline. Migrar resolucion de tokens de Auth Vault al esquema del Marketplace Provider. Validar salida Parquet de capa Bronze para los 3 marketplaces.

Phase 3: Silver + Gold + Brand Health (Week 4-5)Fase 3: Silver + Gold + Brand Health (Semana 4-5)

Silver layer: unify all marketplace data into normalized schemas. Gold layer: Brand Health report — analyze legacy project and migrate calculation rules (overall score, dimension scores, alerts). Bronze+Silver for Fast Data with temporal cleanup strategy. Pre-compute daily_summary and competitor_prices aggregations.Capa Silver: unificar todos los datos de marketplace en schemas normalizados. Capa Gold: reporte Brand Health — analizar proyecto legacy y migrar reglas de calculo (score general, scores por dimension, alertas). Bronze+Silver para Datos Rapidos con estrategia de limpieza temporal. Pre-computar agregaciones de daily_summary y competitor_prices.

Phase 4: Open Metadata + Embeddings + API (Week 6-7)Fase 4: Open Metadata + Embeddings + API (Semana 6-7)

Integrate Open Metadata for both pipelines: lineage tracking and data dictionaries. Build embedding sub-pipelines: Bronze fast data → Cerebro KB for real-time context, Gold Brand Health → Cerebro KB for analytical context. Extend Data API to serve all 3 layers (Bronze fast reads, Silver normalized, Gold aggregated). Redis cache for API with 1h TTL.Integrar Open Metadata para ambos pipelines: tracking de linaje y diccionarios de datos. Construir sub-pipelines de embeddings: datos rapidos Bronze → Cerebro KB para contexto en tiempo real, Brand Health Gold → Cerebro KB para contexto analitico. Extender Data API para servir las 3 capas (lecturas rapidas Bronze, Silver normalizado, Gold agregado). Cache Redis para API con TTL de 1h.

Risk AnalysisAnalisis de Riesgos

Fast Data TTL MisconfigurationMisconfiguracion TTL de Datos Rapidos

Impact: HighImpacto: Alto

Mitigation: TTL too short = excessive API calls to marketplace (rate limiting risk). TTL too long = stale data served as "real-time". Default 15min with per-data-type overrides. Monitor cache hit ratio — target >70%. Cleanup DAG runs every 15min to enforce TTL expiration. Pre-write snapshots have separate TTL (24h) for auditability.Mitigacion: TTL muy corto = llamadas excesivas a API de marketplace (riesgo de rate limiting). TTL muy largo = datos obsoletos servidos como "tiempo real". Default 15min con overrides por tipo de dato. Monitorear ratio de cache hit — objetivo >70%. DAG de limpieza corre cada 15min para forzar expiracion TTL. Snapshots pre-escritura tienen TTL separado (24h) para auditabilidad.

Brand Health Legacy MigrationMigracion Legacy de Brand Health

Impact: MediumImpacto: Medio

Mitigation: Legacy calculation rules are defined and tested in TypeScript — structured clearly by dimension. Not undocumented. TS source files will be provided as input resources. Approach: spike analysis of legacy TS code as first step of Phase 3, document rules per dimension (products, pricing, stock, reputation), implement in Gold layer, validate parity with legacy output before replacing.Mitigacion: Las reglas de calculo legacy estan definidas y probadas en TypeScript — estructuradas claramente por dimension. No estan sin documentar. Los archivos fuente TS se proveeran como recursos de entrada. Enfoque: spike de analisis del codigo legacy TS como primer paso de la Fase 3, documentar reglas por dimension (productos, pricing, stock, reputacion), implementar en capa Gold, validar paridad con output legacy antes de reemplazar.

Embedding Pipeline LatencyLatencia del Pipeline de Embeddings

Impact: MediumImpacto: Medio

Mitigation: Embedding generation adds latency to both pipelines. For Fast Data: generate embeddings async (don't block the read response). For Complete Data: embeddings run as a post-sync DAG step. If embedding service is down, data pipelines continue — embeddings are eventually consistent.Mitigacion: La generacion de embeddings agrega latencia a ambos pipelines. Para Datos Rapidos: generar embeddings async (no bloquear la respuesta de lectura). Para Datos Completos: embeddings corren como paso post-sync del DAG. Si el servicio de embeddings cae, los pipelines de datos continuan — los embeddings son eventualmente consistentes.

Sync Failure Corrupting DataFalla de Sync Corrompiendo Datos

Impact: HighImpacto: Alto

Mitigation: Append-only writes — a failed sync never overwrites previous data. Each Parquet file is dated and immutable. Airflow retries with exponential backoff (3 attempts). Alert on 3 consecutive failures.Mitigacion: Escrituras append-only — un sync fallido nunca sobreescribe datos anteriores. Cada archivo Parquet tiene fecha y es inmutable. Airflow reintenta con backoff exponencial (3 intentos). Alertar en 3 fallas consecutivas.

Key DecisionsDecisiones Clave

D1.

Two pipelines, not one — The existing batch pipeline cannot serve real-time Tool reads. A separate Fast Data pipeline handles on-demand reads aligned with Marketplace Provider Tools. The batch pipeline is preserved as the authoritative historical source for Silver and Gold layers.Dos pipelines, no uno — El pipeline batch existente no puede servir lecturas en tiempo real de Tools. Un pipeline separado de Datos Rapidos maneja lecturas on-demand alineadas con los Tools del Marketplace Provider. El pipeline batch se conserva como la fuente historica autoritativa para capas Silver y Gold.

D2.

Pre-read before every write Tool — Every write operation in Marketplace Provider requires caching current state first. This enables audit trails (before/after), rollback capability, and feeds the ConfirmationFlow preview in the Orchestrator (#2).Pre-lectura antes de cada Tool de escritura — Cada operacion de escritura en Marketplace Provider requiere cachear el estado actual primero. Esto habilita trails de auditoria (antes/despues), capacidad de rollback, y alimenta el preview del ConfirmationFlow en el Orquestador (#2).

D3.

Embeddings as a cross-cutting output — Both pipelines produce embeddings for Cerebro KB. Bronze fast data gives the Orchestrator "real-time" simple context. Gold Brand Health gives it deep analytical context. This separation ensures the agent always has both fresh and historical data.Embeddings como output transversal — Ambos pipelines producen embeddings para Cerebro KB. Los datos rapidos Bronze dan al Orquestador contexto simple "en tiempo real". El Brand Health Gold le da contexto analitico profundo. Esta separacion asegura que el agente siempre tenga datos frescos e historicos.

D4.

Fast Data aligned with Tool Registry (#3), not Marketplace Provider (#12) directly — The Fast Data pipeline reads are aligned 1:1 with Tools defined in Tool Registry (#3). This decouples Data Sync from specific marketplace adapters. #3 defines what Tools exist; #10 generates the corresponding fast queries.Datos Rapidos alineados con Tool Registry (#3), no Marketplace Provider (#12) directamente — Las lecturas del pipeline de Datos Rapidos estan alineadas 1:1 con los Tools definidos en Tool Registry (#3). Esto desacopla Data Sync de los adaptadores de marketplace especificos. #3 define que Tools existen; #10 genera las consultas rapidas correspondientes.

D5.

Open Metadata for governance — Both pipelines register lineage and data dictionaries in Open Metadata. This provides visibility into data flow, schema evolution, and dependencies across the system.Open Metadata para gobernanza — Ambos pipelines registran linaje y diccionarios de datos en Open Metadata. Esto provee visibilidad del flujo de datos, evolucion de schemas, y dependencias a traves del sistema.

MVP Scope

Fast Data pipeline aligned with #12 Tools + pre-write snapshots. Complete Data DAGs (MeLi+Amazon+Shopify). Brand Health in Gold (legacy rules TBD). Embedding sub-pipelines to Cerebro KB. API serving all 3 layers. Open Metadata integration. Pipeline Datos Rapidos alineado con Tools de #12 + snapshots pre-escritura. DAGs Datos Completos (MeLi+Amazon+Shopify). Brand Health en Gold (reglas legacy TBD). Sub-pipelines de embeddings a Cerebro KB. API sirviendo las 3 capas. Integracion Open Metadata.

Inspired byInspirado en

Data Orchestrator (existing). Marketplace Provider Tool alignment. Legacy Brand Health system. Orquestador de Datos (existente). Alineacion con Tools del Marketplace Provider. Sistema legacy de Brand Health.

Source:Fuente: Data Orchestrator + Legacy Brand HealthOrquestador de Datos + Brand Health Legacy | Depends on:Depende de: #3 (Tool Registry — Fast Data alignment), #9 (Cerebro KB — embedding storage), #12 (Marketplace Provider — TokenManager), #14 (DevOps — IaC)
📝 Project ChangelogChangelog del Proyecto
v4 Mar 2, 2026
+Dual pipeline: Fast Data (on-demand reads) + Complete Data (batch sync)Pipeline dual: Datos Rapidos (lecturas on-demand) + Datos Completos (sync batch)
+Embedding sub-pipelines: Bronze → real-time context, Gold → analytical context for Cerebro KBSub-pipelines de embeddings: Bronze → contexto real-time, Gold → contexto analitico para Cerebro KB
+Brand Health dashboard in Gold layerDashboard Brand Health en capa Gold
+Open Metadata integration for lineage tracking and data dictionariesIntegracion con Open Metadata para tracking de linaje y diccionarios de datos
+Pre-read before every write Tool for audit trails and rollbackPre-lectura antes de cada Tool de escritura para trails de auditoria y rollback
~Dependencies: #10 (Auth Vault) → #12 (Marketplace Provider — TokenManager)Dependencias: #10 (Auth Vault) → #12 (Marketplace Provider — TokenManager)
~listing → product rename (NormalizedListing → NormalizedProduct, listings.parquet → products.parquet)Renombrado listing → product (NormalizedListing → NormalizedProduct, listings.parquet → products.parquet)
v3 Feb 27-28, 2026
+Full deep spec card: Airflow DAGs, per-marketplace pipeline, Parquet schemas, API signaturesCard deep spec completa: DAGs Airflow, pipeline por marketplace, Parquet schemas, firmas de API
v2 Feb 27, 2026
+Competitor DAG added: crawls MeLi Search API (/sites/MLA/search) periodicallyDAG de competidores agregado: crawlea MeLi Search API (/sites/MLA/search) periodicamente
v1 Feb 26, 2026
+Initial — Data Sync, EXISTS status (existing Data Orchestrator)Inicial — Data Sync, estado EXISTS (Data Orchestrator existente)
#11

Enrichment Layer

Knowledge — Mateo

NEW

External capabilities gateway for the Coach. Two domains: Market Intelligence (competitors, pricing, keywords) + Content Analysis (image analysis, video analysis, image enhancement). 7 of 8 ANALYSIS tools depend on this service. The Tool Registry does not know which external API is behind each tool — it only knows IEnrichmentService. Adding a new adapter does not require changes to Tool Registry or Coach. Repo: core-knowledge-enrichment. Gateway de capacidades externas del Coach. Dos dominios: Market Intelligence (competidores, precios, keywords) + Content Analysis (analisis de imagen, analisis de video, mejora de imagen). 7 de 8 ANALYSIS tools dependen de este servicio. El Tool Registry no sabe que API externa esta detras de cada tool — solo conoce IEnrichmentService. Agregar un nuevo adapter no requiere cambios en Tool Registry ni Coach. Repo: core-knowledge-enrichment.

Beautonomous governance: Enrichment data feeds ANALYSIS tools only — read-only, never triggering ConfirmationFlow. Tool Registry's ToolPolicyFilter ensures all ANALYSIS tools sourced from Enrichment are gated by Core's permission matrix before the Coach can invoke them.Governance de Beautonomous: los datos de Enrichment alimentan solo tools ANALYSIS — siempre de solo lectura, nunca activan el ConfirmationFlow. El ToolPolicyFilter del Tool Registry garantiza que todas las tools ANALYSIS provenientes de Enrichment estén controladas por la matriz de permisos de Core antes de que el Coach las invoque.

EnrichmentService
IEnrichmentService — internal router + cache, single contract with Tool RegistryIEnrichmentService — router interno + cache, contrato unico con Tool Registry
MeliMarketIntelligenceAdapter
MeLi Search API + Items API (public, free)MeLi Search API + Items API (publica, gratuita)
AmazonMarketIntelligenceAdapter
Rainforest API (Amazon proxy) + Amazon Ads APIRainforest API (proxy Amazon) + Amazon Ads API
VisionLLMContentAdapter
Claude Vision / GPT-4V for image and video analysisClaude Vision / GPT-4V para analisis de imagen y video
ExternalEnhancementAdapter
Specialized APIs (Magnific, Topaz) for image enhancementAPIs especializadas (Magnific, Topaz) para mejora de imagen
RedisEnrichmentCache
Mandatory cache with TTL per data typeCache obligatorio con TTL por tipo de dato

Current StateEstado Actual

OperationalOperativo

Nothing — new project.Nada — proyecto nuevo.

To BuildPor Construir

IEnrichmentService + EnrichmentService (Phase 1). MeliMarketIntelligenceAdapter (Phase 1). VisionLLMContentAdapter (Phase 1). RedisEnrichmentCache (Phase 1). AmazonMarketIntelligenceAdapter (Phase 2). ExternalEnhancementAdapter (Phase 2). get_keyword_data (Phase 2).IEnrichmentService + EnrichmentService (Fase 1). MeliMarketIntelligenceAdapter (Fase 1). VisionLLMContentAdapter (Fase 1). RedisEnrichmentCache (Fase 1). AmazonMarketIntelligenceAdapter (Fase 2). ExternalEnhancementAdapter (Fase 2). get_keyword_data (Fase 2).

Not This ProjectNo Es Este Proyecto

Seller data (orders, stock, metrics) → Data Sync (#10). WRITE operations on marketplace → Marketplace Provider (#12). Image generation from scratch → out of scope. Seller marketplace authentication → Marketplace Provider (#12 — TokenManager).Datos del vendedor (ordenes, stock, metricas) → Data Sync (#10). Operaciones WRITE en marketplace → Marketplace Provider (#12). Generacion de imagenes desde cero → fuera del plan. Autenticacion con marketplace del vendedor → Marketplace Provider (#12 — TokenManager).

Tech Stack (TypeScript — Data Layer)Stack Tecnologico (TypeScript — Capa de Datos)

TypeScript 5+ Redis (cache) MeLi Search API Rainforest API Amazon Ads API Claude Vision Magnific AI
Deep SpecSpec Detallada
Data ModelsModelos de Datos
interface IEnrichmentService {
  executeAnalysis(tool: AnalysisTool, params: Record<string, unknown>): Promise<EnrichmentResult>;
}

type AnalysisTool =
  | 'search_market_products' | 'get_competitor_product'
  | 'get_market_pricing'     | 'get_keyword_data'
  | 'analyze_product_image'  | 'enhance_product_image'
  | 'analyze_product_video';

interface IMarketIntelligenceAdapter {
  searchProducts(params: MarketSearchParams): Promise<MarketProduct[]>;
  getProductDetail(externalId: string, marketplace: Marketplace): Promise<MarketProduct>;
  getKeywordData(keyword: string, marketplace: Marketplace): Promise<KeywordData>;
}

interface IContentAnalysisAdapter {
  analyzeImage(imageUrl: string, context: ImageAnalysisContext): Promise<ImageAnalysisResult>;
  enhanceImage(imageUrl: string, params: EnhancementParams): Promise<EnhancementResult>;
  analyzeVideo(videoUrl: string, context: VideoAnalysisContext): Promise<VideoAnalysisResult>;
}

// + MarketProduct, MarketSearchParams, KeywordData,
//   ImageAnalysisResult, ImageIssue, EnhancementParams,
//   EnhancementResult, VideoAnalysisResult, EnrichmentCacheConfig
API SignaturesFirmas de API
// Internal invocation by Tool Registry (#3) — NO public REST endpoint
// Tool Registry calls IEnrichmentService.executeAnalysis()
// for the 7 ANALYSIS tools

executeAnalysis(tool: AnalysisTool, params: Record<string, unknown>)
  → Promise<EnrichmentResult>

// EnrichmentResult { data, source, cached, latencyMs }
Acceptance CriteriaCriterios de Aceptación
  1. 7 ANALYSIS tools resolve via IEnrichmentService (only get_product_fee_estimate goes direct)7 ANALYSIS tools se resuelven via IEnrichmentService (solo get_product_fee_estimate va directo)
  2. Mandatory Redis cache with TTL per tool (15min search, 30min detail, 1h image/video, 24h keywords, 0 enhance)Cache Redis obligatorio con TTL por tool (15min busqueda, 30min detalle, 1h imagen/video, 24h keywords, 0 enhance)
  3. MeLi adapter works without OAuth (public Search API)Adapter MeLi funciona sin OAuth (Search API publica)
  4. External provider failure → EnrichmentResult with error, Coach reasons about it, never blocks responseSi proveedor externo falla → EnrichmentResult con error, Coach razona al respecto, nunca bloquea respuesta
  5. Adding a new adapter does not require changing Tool Registry or CoachAgregar nuevo adapter no requiere cambiar Tool Registry ni Coach
  6. enhance_product_image enhances real photos, does NOT generate from scratchenhance_product_image mejora fotos reales, NO genera desde cero
  7. External API credentials in SSM (not Marketplace Provider — those are seller OAuth tokens managed by TokenManager)Credenciales de APIs externas en SSM (no Marketplace Provider — esos son tokens OAuth del vendedor gestionados por TokenManager)
  8. get_market_pricing computes distribution (min, max, p25, p75, median) over search results, not an external APIget_market_pricing calcula distribucion (min, max, p25, p75, median) sobre resultados de busqueda, no es API externa
How It WorksComo Funciona
Coach (LLM loop)
      |  needs external data
      v
Tool Registry (#3) -> handler ANALYSIS tool
      |
      v
IEnrichmentService.executeAnalysis(tool, params)
      |
      +-- RedisEnrichmentCache -> hit? return cached
      |
      +-- Market Intelligence --> MeLi Search API (free)
      |                      --> Rainforest API (Amazon)
      |                      --> Amazon Ads API / Helium 10
      |
      +-- Content Analysis   --> Vision LLM (Claude / GPT-4V)
                             --> Enhancement API (Magnific, Topaz)
      |
      v
EnrichmentResult { data, source, cached, latencyMs }
File StructureEstructura de Archivos
core-knowledge-enrichment/
+-- src/
|   +-- domain/interfaces/
|   |   +-- IMarketIntelligenceAdapter.ts
|   |   +-- IContentAnalysisAdapter.ts
|   |   +-- IEnrichmentService.ts
|   +-- domain/models/
|   |   +-- MarketProduct.ts, KeywordData.ts
|   |   +-- ImageAnalysisResult.ts, VideoAnalysisResult.ts
|   |   +-- EnrichmentResult.ts
|   +-- application/
|   |   +-- EnrichmentService.ts (router + cache)
|   +-- infrastructure/
|       +-- market/ (MeliAdapter, AmazonAdapter)
|       +-- content/ (VisionLLMAdapter, EnhancementAdapter)
|       +-- cache/ (RedisEnrichmentCache.ts)
+-- test/

Implementation PlanPlan de Implementacion

Phase 0 — ResearchFase 0 — Investigacion

Evaluate platforms and providers before writing code. Market Intelligence: compare MeLi Search API (free, rate limits?), Rainforest API (pricing tiers, reliability), Amazon Ads API (access, latency), Helium 10 / Jungle Scout (API availability, cost). Content Analysis: compare Claude Vision vs GPT-4V (cost per image, accuracy on marketplace photos), evaluate image enhancement APIs (Magnific AI, Topaz Photo AI, Remove.bg — pricing, quality, latency). Cache: confirm Redis (Cloud Memorystore) specs and pricing for required TTLs. Deliverable: comparison table with recommendation per domain + estimated monthly cost.Evaluar plataformas y proveedores antes de escribir codigo. Market Intelligence: comparar MeLi Search API (free, rate limits?), Rainforest API (pricing tiers, reliability), Amazon Ads API (acceso, latencia), Helium 10 / Jungle Scout (disponibilidad API, costo). Content Analysis: comparar Claude Vision vs GPT-4V (costo por imagen, accuracy en fotos de marketplace), evaluar APIs de mejora de imagen (Magnific AI, Topaz Photo AI, Remove.bg — pricing, calidad, latencia). Cache: confirmar Redis (Cloud Memorystore) specs y pricing para TTLs requeridos. Entregable: tabla comparativa con recomendacion por dominio + costo estimado mensual.

Phase 1 — First ANALYSIS tools activeFase 1 — Primeras ANALYSIS tools activas

MeliMarketIntelligenceAdapter (MeLi Search + Items API, no cost). VisionLLMContentAdapter (analyzeImage + analyzeVideo via Claude Vision). Redis cache with basic TTL. Tools operative: search_market_products, get_competitor_product, get_market_pricing, analyze_product_image, analyze_product_video.MeliMarketIntelligenceAdapter (MeLi Search + Items API, sin costo). VisionLLMContentAdapter (analyzeImage + analyzeVideo via Claude Vision). Redis cache con TTL basico. Tools operativas: search_market_products, get_competitor_product, get_market_pricing, analyze_product_image, analyze_product_video.

Phase 2 — Amazon + EnhancementFase 2 — Amazon + Mejora

AmazonMarketIntelligenceAdapter (Rainforest API). get_keyword_data operative (Amazon Ads API or Helium 10). ExternalEnhancementAdapter (enhance_product_image via external API). Cache with key normalization.AmazonMarketIntelligenceAdapter (Rainforest API). get_keyword_data operativo (Amazon Ads API o Helium 10). ExternalEnhancementAdapter (enhance_product_image via API externa). Cache con normalizacion de keys.

Phase 3+ — ExtensibilityFase 3+ — Extensibilidad

Support for new adapters via registry (no EnrichmentService modification). Rate limiting per provider. Fallback between providers.Soporte para nuevos adapters via registro (sin modificar EnrichmentService). Rate limiting por proveedor. Fallback entre proveedores.

Risk AnalysisAnalisis de Riesgo

MED

External APIs with variable latency (200ms–3s) — Mandatory cache reduces calls, TTL per data volatility. On failure → clear error, Coach continues.APIs externas con latencia variable (200ms–3s) — Cache obligatorio reduce llamadas, TTL por volatilidad del dato. Si falla → error claro, Coach continua.

MED

Paid API costs scale with usage — Phase 0 Research evaluates pricing. MeLi is free. Rainforest and image APIs have tiers. Monitor with Billing & Credit Economy (#13).Costos de APIs de pago escalan con uso — Phase 0 Research evalua pricing. MeLi es gratuita. Rainforest y APIs de imagen tienen tiers. Monitorear con Billing & Credit Economy (#13).

LOW

Inconsistent visual analysis quality between LLMs — Phase 0 Research compares Claude Vision vs GPT-4V on real marketplace photos.Calidad de analisis visual inconsistente entre LLMs — Phase 0 Research compara Claude Vision vs GPT-4V en fotos de marketplace reales.

Key DecisionsDecisiones Clave

D1: One project, two domains (market + content) — same pattern (adapters, cache, routing), same client (Tool Registry).Un proyecto, dos dominios (market + content) — mismo patron (adapters, cache, routing), mismo cliente (Tool Registry).

D2: Enhancement vs generation — enhance_product_image enhances real photos, does NOT generate from scratch (product decision).Mejora vs generacion — enhance_product_image mejora fotos reales, NO genera desde cero (decision de producto).

D3: The Coach decides, Enrichment provides — never makes business decisions, only returns data.El Coach decide, el Enrichment provee — nunca toma decisiones de negocio, solo devuelve datos.

D4: Credentials separate from Marketplace Provider — external APIs (vision LLM, market intelligence) use SSM directly; seller OAuth tokens are managed by TokenManager in #12.Credenciales separadas del Marketplace Provider — APIs externas (vision LLM, market intelligence) usan SSM directamente; tokens OAuth del vendedor son gestionados por TokenManager en #12.

D5: Mandatory cache as part of the contract — Tool Registry can call without worrying about latency.Cache obligatorio como parte del contrato — Tool Registry puede llamar sin preocuparse de latencia.

MVP Scope

Phase 0 Research + Phase 1: MeLi market intelligence + image/video analysis via Claude Vision + Redis cache. 5 of 7 ANALYSIS tools operative. Phase 0 Research + Fase 1: MeLi market intelligence + analisis de imagen/video via Claude Vision + Redis cache. 5 de 7 ANALYSIS tools operativas.

SourceFuente

New project — no existing source. Proyecto nuevo — sin fuente existente.

Source:Fuente: New projectProyecto nuevo | Depends on:Depende de: #3 (Tool Registry — client), #13 (Billing & Credit Economy — LLM vision costs)
📝 Project ChangelogChangelog del Proyecto
v4 Mar 2, 2026
+New project created — enrichment gateway for external capabilitiesProyecto nuevo creado — gateway de enriquecimiento para capacidades externas
+Two domains: Market Intelligence + Content AnalysisDos dominios: Market Intelligence + Content Analysis
+IEnrichmentService + IMarketIntelligenceAdapter + IContentAnalysisAdapterIEnrichmentService + IMarketIntelligenceAdapter + IContentAnalysisAdapter
+7 of 8 ANALYSIS tools from Tool Registry (#3)7 de 8 ANALYSIS tools del Tool Registry (#3)
+Redis cache with TTL per typeCache Redis con TTL por tipo
+Phase 0 Research for external platform evaluationPhase 0 Research para evaluacion de plataformas externas

Layer 4 — ACTIONCapa 4 — ACCIÓN

What the Coach can do in the marketplaceLo que el Coach puede hacer en el marketplace

+
#12

Marketplace Provider

Action — Andrés

REWRITE

Absorbs former #10 Auth & Credentials Vault — token management is now an internal moduleAbsorbe el antiguo #10 Auth & Credentials Vault — gestion de tokens es ahora un modulo interno

Unified execution and credential management layer for all marketplace operations. Abstracts marketplace APIs behind a single IMarketplaceAdapter contract using Strategy pattern — each marketplace is a pluggable adapter implementing the same interface. Each request carries two fields: userId and marketplaceSlug — the adapter resolves OAuth2 tokens internally via an ITokenManager module. No auth token ever crosses the public interface. DynamoDB stores encrypted OAuth2 credentials (AES-256-GCM), AWS Secrets Manager holds static secrets (client_id, client_secret, encryption keys). A cron refreshes tokens proactively 30min before expiry. MVP ships all 3 marketplaces from Phase 1: MeLi REST + Amazon SP-API + Shopify GraphQL, covering 17 write tools across 4 domains (Catalog, Engagement, Advertising, Enrollment). SKU is the primary identifier — each adapter internally resolves SKU to the marketplace-native product ID. Reads are limited to capturing pre-transaction state for rollback; the primary read path lives in Data Sync (#10). Includes explicit onboarding flow for first-time marketplace connection. Capa unificada de ejecucion y gestion de credenciales para todas las operaciones de marketplace. Abstrae las APIs de marketplaces detras de un solo contrato IMarketplaceAdapter usando patron Strategy — cada marketplace es un adaptador pluggable que implementa la misma interfaz. Cada request lleva dos campos: userId y marketplaceSlug — el adaptador resuelve tokens OAuth2 internamente via un modulo ITokenManager. Ningun token cruza la interfaz publica. DynamoDB almacena credenciales OAuth2 encriptadas (AES-256-GCM), AWS Secrets Manager guarda secretos estaticos (client_id, client_secret, keys de encriptacion). Un cron renueva tokens proactivamente 30min antes de expirar. El MVP incluye los 3 marketplaces desde Fase 1: MeLi REST + Amazon SP-API + Shopify GraphQL, cubriendo 17 write tools en 4 dominios (Catalogo, Engagement, Advertising, Enrolamiento). SKU es el identificador primario — cada adaptador resuelve internamente SKU al ID nativo del marketplace. Las lecturas se limitan a capturar el estado pre-transaccion para rollback; la lectura principal vive en Data Sync (#10). Incluye flujo de onboarding explicito para la primera conexion al marketplace.

Beautonomous governance: all 17 WRITE tools execute through Core's ConfirmationFlow (PENDING → CONFIRMED/REJECTED/EXPIRED) and are gated by Core's permission matrix — El Artesano proposes, El Mago or El Capitán confirms. No marketplace write operation can execute without an explicit human confirmation.Governance de Beautonomous: las 17 WRITE tools se ejecutan a través del ConfirmationFlow de Core (PENDING → CONFIRMED/REJECTED/EXPIRED) y están controladas por la matriz de permisos de Core — El Artesano propone, El Mago o El Capitán confirman. Ninguna operación WRITE en marketplace puede ejecutarse sin una confirmación humana explícita.

MeLi REST Adapter
Raw HTTP — no SDK availableHTTP directo — no hay SDK
Amazon SP-API Adapter
@sp-api-sdk + LWA OAuth2@sp-api-sdk + LWA OAuth2
Shopify GraphQL Adapter
@shopify/shopify-api (GraphQL)@shopify/shopify-api (GraphQL)
IMarketplaceAdapter
Strategy — 23 methodsStrategy — 23 metodos
TokenManager
DynamoDB + AES-256 (ex #10)DynamoDB + AES-256 (ex #10)
OAuth2 Flow Manager
MeLi + Amazon LWA + ShopifyMeLi + Amazon LWA + Shopify
SKU Resolver
SKU → marketplace product IDSKU → ID del marketplace
Onboarding Flow
Connect → OAuth → first syncConectar → OAuth → primer sync

Recommended Tech StackStack Tecnologico Recomendado

TypeScript Node.js 22+ Express / Fastify axios (MeLi raw HTTP) @sp-api-sdk/* (Amazon) @shopify/shopify-api AWS SDK v3 (DynamoDB) AWS Secrets Manager Redis (snapshot cache) Vitest CDK (IaC via #14)

MeLi has no maintained SDK (archived 2022) — raw HTTP via axios. Amazon @sp-api-sdk is the most active TS SDK. Shopify REST deprecated Oct 2024 — GraphQL only. TypeScript chosen for consistency with core-intelligence services and strong typing of adapter interfaces.MeLi no tiene SDK mantenido (archivado 2022) — HTTP directo via axios. Amazon @sp-api-sdk es el SDK TS mas activo. Shopify REST deprecado Oct 2024 — solo GraphQL. TypeScript elegido por consistencia con servicios core-intelligence y tipado fuerte de interfaces de adaptadores.

Data Models, API Contracts & Acceptance Criteria Modelos de Datos, Contratos de API & Criterios de Aceptación
Data ModelsModelos de Datos
// MarketplaceRequest — NO auth_token (adapter resolves internally via TokenManager)
interface MarketplaceRequest {
  userId: string;                 // Shopilot user ID
  marketplaceSlug: MarketplaceSlug; // 'mercadolibre' | 'amazon' | 'shopify'
}

interface NormalizedProduct {
  sku: string;                    // Primary key — seller's SKU
  productId: string;              // Marketplace-native (MLA123, ASIN, GID)
  marketplace: MarketplaceSlug;
  country: string;                // ISO 3166-1 alpha-2
  title: string;
  description: string;
  price: Money;                   // { amount: number, currency: string }
  stock: number;
  condition: 'new' | 'used' | 'refurbished';
  status: 'active' | 'paused' | 'closed';
  category: Category;             // { id, name, path }
  images: string[];
  video: string | null;
  attributes: Record<string, unknown>;
  url: string;
  raw: Record<string, unknown>;   // Raw unnormalized response
  lastSynced: Date;
}

interface MarketplaceAction {
  id: string;                     // UUID
  userId: string;
  sku: string;
  marketplace: MarketplaceSlug;
  actionType: 'create' | 'update' | 'delete';
  domain: 'catalog' | 'engagement' | 'advertising' | 'enrollment';
  fieldChanged: string | null;
  beforeValue: unknown;           // Snapshot from cache for rollback
  afterValue: unknown;
  riskLevel: 'reversible' | 'irreversible';
  rollbackToken: string | null;
  status: 'pending' | 'confirmed' | 'executed' | 'rolled_back' | 'failed';
  executedAt: Date;
  executionTimeMs: number;
  apiResponseCode: number;
}
IMarketplaceAdapter (23 methods)IMarketplaceAdapter (23 metodos)
interface IMarketplaceAdapter {
  // —— ENROLLMENT (3) ——
  connectMarketplace(req: MarketplaceRequest, credentials: OAuthTokens): Promise<ConnectionResult>;
  disconnectMarketplace(req: MarketplaceRequest): Promise<ConnectionResult>;
  getConnectionStatus(req: MarketplaceRequest): Promise<ConnectionStatus>;

  // —— CATALOG — Create/Modify/Delete (9) ——
  publishProduct(req: MarketplaceRequest, sku: string, draft: ProductDraft): Promise<MarketplaceAction>;        // IRREVERSIBLE
  updateProductContent(req: MarketplaceRequest, sku: string, content: ContentUpdate): Promise<MarketplaceAction>; // REVERSIBLE
  updateProductImages(req: MarketplaceRequest, sku: string, images: string[]): Promise<MarketplaceAction>;       // REVERSIBLE
  updateProductVideo(req: MarketplaceRequest, sku: string, video: string): Promise<MarketplaceAction>;           // REVERSIBLE
  updatePrice(req: MarketplaceRequest, sku: string, price: Money): Promise<MarketplaceAction>;                   // REVERSIBLE
  updateStock(req: MarketplaceRequest, sku: string, qty: number, locationId?: string): Promise<MarketplaceAction>; // REVERSIBLE
  pauseProduct(req: MarketplaceRequest, sku: string): Promise<MarketplaceAction>;                               // REVERSIBLE
  activateProduct(req: MarketplaceRequest, sku: string): Promise<MarketplaceAction>;                            // REVERSIBLE
  closeProduct(req: MarketplaceRequest, sku: string): Promise<MarketplaceAction>;                               // IRREVERSIBLE

  // —— ENGAGEMENT (4) ——
  answerQuestion(req: MarketplaceRequest, questionId: string, answer: string): Promise<MarketplaceAction>;      // IRREVERSIBLE
  hideQuestion(req: MarketplaceRequest, questionId: string): Promise<MarketplaceAction>;                        // REVERSIBLE (MeLi)
  sendBuyerMessage(req: MarketplaceRequest, orderId: string, msg: string): Promise<MarketplaceAction>;          // IRREVERSIBLE
  requestReview(req: MarketplaceRequest, orderId: string): Promise<MarketplaceAction>;                          // IRREVERSIBLE

  // —— ADVERTISING (4) ——
  createCampaign(req: MarketplaceRequest, draft: CampaignDraft): Promise<MarketplaceAction>;                    // REVERSIBLE
  updateCampaign(req: MarketplaceRequest, campaignId: string, changes: CampaignUpdate): Promise<MarketplaceAction>; // REVERSIBLE
  pauseCampaign(req: MarketplaceRequest, campaignId: string): Promise<MarketplaceAction>;                       // REVERSIBLE
  activateCampaign(req: MarketplaceRequest, campaignId: string): Promise<MarketplaceAction>;                    // REVERSIBLE

  // —— PRE-TRANSACTION READ (1) ——
  snapshotProduct(req: MarketplaceRequest, sku: string): Promise<NormalizedProduct>;

  // —— INFRA (1) ——
  getRateLimits(): RateLimitInfo;

  // INTERNAL: Token resolved by adapter, not caller
  // Each adapter constructor receives ITokenManager
  // On each API call: token = await tokenManager.getToken(req.userId, req.marketplaceSlug)
}
ITokenManager & Credentials Schema (absorbs #10)ITokenManager & Schema de Credenciales (absorbe #10)
// Replaces ICredentialsVault from eliminated #10
interface ITokenManager {
  getToken(userId: string, marketplace: MarketplaceSlug): Promise<string>;
  storeCredentials(userId: string, marketplace: MarketplaceSlug, tokens: OAuthTokens): Promise<void>;
  revokeCredentials(userId: string, marketplace: MarketplaceSlug): Promise<void>;
  getConnectedMarketplaces(userId: string): Promise<MarketplaceConnection[]>;
  forceRefresh(userId: string, marketplace: MarketplaceSlug): Promise<string>;
}

// —— DynamoDB Table: marketplace_credentials ——
// PK: userId#marketplace  (e.g., "user_123#mercadolibre")
// SK: "CREDENTIAL"
{
  pk: string;                         // userId#marketplace
  sk: 'CREDENTIAL';
  accessToken: string;                // encrypted (AES-256-GCM)
  refreshToken: string;               // encrypted (AES-256-GCM)
  expiresAt: number;                  // Unix timestamp
  scopes: string[];                   // Granted permissions
  marketplaceUserId: string;
  marketplaceNickname: string;
  marketplaceCountry: string;         // AR, MX, BR, US...
  connectedAt: string;                // ISO 8601
  lastRefreshedAt: string;
  lastUsedAt: string;
  status: 'active' | 'expired' | 'revoked' | 'disconnected';
  refreshFailures: number;            // Consecutive failure counter
  encryptionKeyId: string;            // AWS Secrets Manager key ref
  ttl: number;                        // DynamoDB TTL (24 months)
}

// —— AWS Secrets Manager stores ——
// shopilot/marketplace/mercadolibre  → { clientId, clientSecret, redirectUri }
// shopilot/marketplace/amazon        → { clientId, clientSecret, redirectUri }
// shopilot/marketplace/shopify       → { apiKey, apiSecret, redirectUri }
// shopilot/encryption/token-key      → AES-256-GCM encryption key

// —— Token Refresh Cron (EventBridge every 5min) ——
// Query: expiresAt < NOW() + 30min AND status = 'active'
// For each: call marketplace OAuth refresh endpoint
// On success: update DynamoDB, reset refreshFailures = 0
// On failure: increment refreshFailures
//   If refreshFailures >= 3: status = 'expired', notify user, pause Data Sync (#10)
REST EndpointsEndpoints REST
// —— Onboarding & Auth ——
POST /auth/connect/:marketplace            // Starts OAuth2 flow, returns redirect URL
GET  /auth/callback/:marketplace           // OAuth2 callback → exchange code → store encrypted tokens
GET  /auth/marketplaces/:userId            // List connected marketplaces + status
DELETE /auth/disconnect/:userId/:marketplace // Revoke tokens at provider + DynamoDB cleanup
POST /auth/refresh/:userId/:marketplace    // Force manual token refresh

// —— Internal (called by adapter, not exposed to frontend) ——
GET  /internal/token/:userId/:marketplace  // Returns fresh decrypted token (<50ms from cache)

// —— Marketplace Operations (called by Orchestrator tools) ——
POST /marketplace/execute                  // { action, req, params } → MarketplaceAction
GET  /marketplace/snapshot/:userId/:marketplace/:sku  // Pre-transaction state capture

// Response shape for all write operations:
interface ExecuteResponse {
  action: MarketplaceAction;
  warnings: string[];              // e.g., "Rate limit at 80%"
}
Acceptance CriteriaCriterios de Aceptación
  • publishProduct(sku, draft) creates product in target marketplace and returns MarketplaceAction with productId in <2s
  • updatePrice(sku, price) changes price, stores beforeValue for rollback, verifies change applied
  • snapshotProduct(sku) captures full pre-transaction state in <500ms
  • Rate limiting respects MeLi 1,500 req/min, Amazon per-endpoint limits, Shopify cost-point bucket — zero 429 errors
  • Token resolution is internal — adapter calls tokenManager.getToken() automatically, caller never provides auth_token
  • If OAuth2 token expired, TokenManager auto-refreshes before retry. MeLi token refresh serialized with mutex to prevent concurrent invalidation
  • requestReview on MeLi returns NotSupportedError with descriptive message
  • connectMarketplace completes OAuth2 flow and stores encrypted credentials in DynamoDB
  • All 17 write tools work against MeLi + Amazon + Shopify (minus N/A per matrix)
  • Complete onboarding: user clicks “Connect MeLi” → OAuth2 → tokens stored encrypted → first Data Sync (#10) triggers automatically
  • Token auto-refresh works without user intervention (cron every 5min, pre-refresh 30min before expiry)
  • If refresh fails 3 consecutive times, user gets reconnection notification and Data Sync pauses gracefully
  • getToken() returns valid decrypted token in <100ms (DynamoDB direct, no cache layer)
  • Credentials encrypted at rest (AES-256-GCM) — not readable in DynamoDB directly
  • Disconnecting marketplace revokes tokens at provider level and stops Data Sync
  • publishProduct(sku, draft) crea producto en marketplace destino y retorna MarketplaceAction con productId en <2s
  • updatePrice(sku, price) cambia precio, guarda beforeValue para rollback, verifica que el cambio se aplico
  • snapshotProduct(sku) captura estado pre-transaccion completo en <500ms
  • Rate limiting respeta MeLi 1,500 req/min, Amazon limites por endpoint, Shopify bucket de cost-points — cero errores 429
  • Resolucion de tokens es interna — el adaptador llama tokenManager.getToken() automaticamente, el caller nunca provee auth_token
  • Si token OAuth2 expiro, TokenManager auto-renueva antes de reintentar. Token refresh de MeLi serializado con mutex para prevenir invalidacion concurrente
  • requestReview en MeLi retorna NotSupportedError con mensaje descriptivo
  • connectMarketplace completa flujo OAuth2 y almacena credenciales encriptadas en DynamoDB
  • Los 17 write tools funcionan contra MeLi + Amazon + Shopify (menos N/A segun matriz)
  • Onboarding completo: usuario clickea “Conectar MeLi” → OAuth2 → tokens almacenados encriptados → primer Data Sync (#10) se dispara automaticamente
  • Auto-refresh de tokens funciona sin intervencion del usuario (cron cada 5min, pre-refresh 30min antes de expirar)
  • Si refresh falla 3 veces consecutivas, usuario recibe notificacion de reconexion y Data Sync se pausa graciosamente
  • getToken() retorna token descifrado valido en <100ms (DynamoDB directo, sin capa de cache)
  • Credenciales encriptadas en reposo (AES-256-GCM) — no legibles directamente en DynamoDB
  • Desconectar marketplace revoca tokens a nivel proveedor y detiene Data Sync

MeLi rate: 1,500 req/min per seller · Amazon: per-endpoint burst/restore · Shopify: cost-point leaky bucket (1000pts, 50pts/s) · MeLi token: 6h expiry · Refresh cron: 5min · getToken: <100ms (DynamoDB direct)

17 Write Tools — Marketplace Support Matrix 17 Write Tools — Matriz de Soporte por Marketplace
DomainDominio Write Tool MeLi Amazon Shopify RiskRiesgo
Catalogpublish_productIRREVERSIBLE
update_product_contentREVERSIBLE
update_product_imagesREVERSIBLE
update_product_video(A+)REVERSIBLE
update_priceREVERSIBLE
update_stockREVERSIBLE
pause_productREVERSIBLE
activate_productREVERSIBLE
close_productIRREVERSIBLE
Engagementanswer_questionN/AIRREVERSIBLE
hide_questionN/AN/AREVERSIBLE
send_buyer_messageIRREVERSIBLE
request_reviewIRREVERSIBLE
Advertisingcreate_campaign(SP-Ads)REVERSIBLE
update_campaignREVERSIBLE
pause_campaignREVERSIBLE
activate_campaignREVERSIBLE

MeLi does not support request_review — adapter returns NotSupportedError with descriptive message. IRREVERSIBLE = requires user approval | REVERSIBLE = pauses for user confirmation.MeLi no soporta request_review — adaptador retorna NotSupportedError con mensaje descriptivo. IRREVERSIBLE = requiere aprobacion del usuario | REVERSIBLE = pausa para confirmacion del usuario.

How It WorksComo Funciona

WRITE OPERATION                              ONBOARDING FLOW
==============                               ===============

Orchestrator tool call:                      1. User clicks "Connect MeLi"
  updatePrice({ userId, marketplaceSlug },          |
              sku="PROD-001", price=29.99)          v
         |                                   2. POST /auth/connect/mercadolibre
         v                                      → returns OAuth2 redirect URL
+---------------------------+                       |
|  MarketplaceProvider      |                       v
|  1. Resolve token         |                3. User accepts permissions
|     tokenManager          |                       |
|     .getToken(userId,     |                       v
|      marketplaceSlug)     |                4. GET /auth/callback/mercadolibre
|  2. Resolve SKU → ID      |                   exchange code → tokens
|  3. Snapshot pre-state    |                       |
|  4. Route to adapter      |                       v
+----------+----------------+                5. TokenManager.storeCredentials()
           |                                    encrypt(AES-256-GCM) → DynamoDB
    +------+------+--------+                        |
    v             v         v                       v
+-----------+ +---------+ +-----------+      6. Trigger first Data Sync (#10)
| MeLi      | | Amazon  | | Shopify   |
| Adapter   | | Adapter | | Adapter   |
|           | |         | |           |      TOKEN REFRESH (automatic)
| REST API  | | SP-API  | | GraphQL   |     =========================
| OAuth2    | | LWA     | | OAuth2    |
| 1.5K/min  | | varies  | | cost-pts  |     EventBridge cron (every 5min):
+-----------+ +---------+ +-----------+       Query: expiresAt < NOW()+30min
    |             |         |                  → refresh at marketplace
    v             v         v                  → update DynamoDB
+---------------------------------------+     If fails 3x: expire + notify
|        MarketplaceAction              |
|  actionType: update                   |
|  domain: catalog                      |    getToken() FLOW:
|  beforeValue: {price: 39.99}          |      1. DynamoDB (direct read)
|  afterValue: {price: 29.99}           |      2. Decrypt AES-256-GCM
|  rollbackToken: "rt_abc123"           |      3. Latency: <100ms
+---------------------------------------+      (no cache layer)

The Strategy pattern routes each call to the correct adapter based on marketplaceSlug. Tokens are resolved internally by the adapter via TokenManager.getToken() — the caller never provides auth credentials. Before each write, snapshotProduct() captures current state for rollback. IRREVERSIBLE operations require user approval; REVERSIBLE operations pause for confirmation. The adapter returns NotSupportedError when an operation is unavailable (e.g., requestReview on MeLi). MeLi token refresh is serialized with a mutex to prevent concurrent invalidation (only the last refresh_token is valid).El patron Strategy enruta cada llamada al adaptador correcto basado en marketplaceSlug. Los tokens se resuelven internamente por el adaptador via TokenManager.getToken() — el caller nunca provee credenciales de autenticacion. Antes de cada write, snapshotProduct() captura el estado actual para rollback. Las operaciones IRREVERSIBLE requieren aprobacion del usuario; las REVERSIBLE pausan para confirmacion. El adaptador retorna NotSupportedError cuando una operacion no esta disponible (ej: requestReview en MeLi). El token refresh de MeLi se serializa con mutex para prevenir invalidacion concurrente (solo el ultimo refresh_token es valido).

Marketplace Developer CredentialsCredenciales de Desarrollador por Marketplace

Marketplace ProcessProceso TimeTiempo Required DocsDocs Requeridos
MercadoLibre Create app on developers.mercadolibre.com, request write permissionsCrear app en developers.mercadolibre.com, solicitar permisos de escritura 1-2 weekssemanas Company name, callback URL, usage descriptionNombre empresa, URL callback, descripcion de uso
Amazon SP-API Register on developer.amazonservices.com, LWA client ID (no longer requires AWS IAM)Registrar en developer.amazonservices.com, LWA client ID (ya no requiere AWS IAM) 2-4 weekssemanas Company name, address, tax data, use caseNombre empresa, direccion, datos fiscales, caso de uso
Shopify Create app in Partners Dashboard, request write scopesCrear app en Partners Dashboard, solicitar scopes de escritura 1-3 daysdias Company name, callback URL, privacy policyNombre empresa, URL callback, politica de privacidad

Responsible: El Capitan reviews process status weekly. Timeline: Start all 3 in parallel at Week 0 (pre-sprint). Blocker if not completed before Week 2. Amazon SP-API no longer requires AWS IAM or Signature v4 (removed Oct 2023) — auth is standard OAuth2/LWA.Responsable: El Capitan revisa estado de procesos semanalmente. Cronograma: Iniciar los 3 en paralelo en Week 0 (pre-sprint). Blocker si no se completan antes de Week 2. Amazon SP-API ya no requiere AWS IAM ni Signature v4 (eliminado Oct 2023) — auth es OAuth2/LWA estandar.

API Documentation MonitoringMonitoreo de Documentación de APIs

Owned by #16 Eval Suite (api_monitor pipeline). Daily changelog checks + canary tests against live marketplace endpoints. Breaking changes create a Linear issue tagged api-change with the affected adapter and recommended action. This project consumes the alerts and acts on them via adapter patches.Responsabilidad de #16 Eval Suite (pipeline api_monitor). Chequeos diarios de changelogs + canary tests contra endpoints de marketplaces en vivo. Los cambios incompatibles generan un issue en Linear con tag api-change, el adaptador afectado y la accion recomendada. Este proyecto consume las alertas y actua sobre ellas via patches de adaptadores.

Implementation PlanPlan de Implementacion

Phase 0: Developer Credentials + Setup (Week 0)Fase 0: Credenciales de Desarrollador + Setup (Semana 0)

Start developer account applications on all 3 marketplaces in parallel. El Capitan as weekly tracking owner. Prioritize MeLi (fastest, 1-2 weeks). Set up TypeScript project scaffold, DynamoDB table, AWS Secrets Manager secrets. CDK stacks defined in #14.Iniciar tramites de developer accounts en los 3 marketplaces en paralelo. El Capitan como responsable de seguimiento semanal. Priorizar MeLi (mas rapido, 1-2 semanas). Configurar scaffold del proyecto TypeScript, tabla DynamoDB, secretos en AWS Secrets Manager. Stacks CDK definidos en #14.

Phase 1: TokenManager + OAuth2 Flows (Week 1-2)Fase 1: TokenManager + Flujos OAuth2 (Semana 1-2)

Implement ITokenManager with DynamoDB backend + AES-256-GCM encryption. Build OAuth2FlowManager for all 3 marketplaces: MeLi standard OAuth2, Amazon LWA, Shopify OAuth2. Implement /auth/connect, /auth/callback, /auth/disconnect endpoints. Build token refresh cron (EventBridge every 5min, pre-refresh 30min). getToken() reads directly from DynamoDB (<100ms, no cache layer). MeLi token refresh serialized with mutex. This was formerly #10 — now an internal module.Implementar ITokenManager con backend DynamoDB + encriptacion AES-256-GCM. Construir OAuth2FlowManager para los 3 marketplaces: MeLi OAuth2 estandar, Amazon LWA, Shopify OAuth2. Implementar endpoints /auth/connect, /auth/callback, /auth/disconnect. Construir cron de token refresh (EventBridge cada 5min, pre-refresh 30min). getToken() lee directamente de DynamoDB (<100ms, sin capa de cache). Token refresh de MeLi serializado con mutex. Esto era el antiguo #10 — ahora es un modulo interno.

Phase 2: IMarketplaceAdapter + MeLi Adapter (Week 2-3)Fase 2: IMarketplaceAdapter + Adaptador MeLi (Semana 2-3)

Define IMarketplaceAdapter interface with 23 methods (17 write + 3 enrollment + 2 read + 1 infra). Implement SKUResolver (SKU → productId). Implement MeLiAdapter via raw axios (no maintained SDK exists). MarketplaceRequest carries userId + marketplaceSlug only — adapter resolves tokens internally via TokenManager.Definir interfaz IMarketplaceAdapter con 23 metodos (17 escritura + 3 enrolamiento + 2 lectura + 1 infra). Implementar SKUResolver (SKU → productId). Implementar MeLiAdapter via axios directo (no hay SDK mantenido). MarketplaceRequest solo lleva userId + marketplaceSlug — el adaptador resuelve tokens internamente via TokenManager.

Phase 3: Amazon + Shopify Adapters (Week 3-4)Fase 3: Adaptadores Amazon + Shopify (Semana 3-4)

Implement AmazonAdapter using @sp-api-sdk with LWA auth (standard OAuth2, no longer requires AWS SigV4). Implement ShopifyAdapter using @shopify/shopify-api with GraphQL exclusively (REST deprecated Oct 2024). Handle NotSupportedError for N/A operations. Both normalize to MarketplaceAction.Implementar AmazonAdapter usando @sp-api-sdk con auth LWA (OAuth2 estandar, ya no requiere AWS SigV4). Implementar ShopifyAdapter usando @shopify/shopify-api con GraphQL exclusivamente (REST deprecado Oct 2024). Manejar NotSupportedError para operaciones N/A. Ambos normalizan a MarketplaceAction.

Phase 4: Rate Limiting + Rollback + Onboarding (Week 5-6)Fase 4: Rate Limiting + Rollback + Onboarding (Semana 5-6)

Redis cache for pre-transaction snapshots via snapshotProduct(). Per-marketplace rate limiter: MeLi 1,500 req/min token bucket, Amazon per-endpoint burst/restore, Shopify cost-point leaky bucket. Rollback tokens for REVERSIBLE operations. Onboarding flow: connectMarketplace() → OAuth → store credentials → trigger first Data Sync (#10). Integration with Observability (#8).Cache Redis para snapshots pre-transaccion via snapshotProduct(). Rate limiter por marketplace: MeLi 1,500 req/min token bucket, Amazon burst/restore por endpoint, Shopify leaky bucket cost-point. Rollback tokens para operaciones REVERSIBLE. Flujo de onboarding: connectMarketplace() → OAuth → almacenar credenciales → disparar primer Data Sync (#10). Integracion con Observability (#8).

Risk AnalysisAnalisis de Riesgos

Rate Limit ExhaustionAgotamiento de Rate Limits

Impact: HImpacto: A

Mitigation: Per-marketplace rate limiter with backoff. MeLi: 1,500 req/min per seller (token bucket). Amazon: per-endpoint burst/restore. Shopify: cost-point leaky bucket (1000pts, 50pts/s restore) — monitor extensions.cost.throttleStatus in each GraphQL response. Queue overflow requests.Mitigacion: Rate limiter por marketplace con backoff. MeLi: 1,500 req/min por seller (token bucket). Amazon: burst/restore por endpoint. Shopify: leaky bucket cost-point (1000pts, 50pts/s restore) — monitorear extensions.cost.throttleStatus en cada respuesta GraphQL. Encolar requests en overflow.

Token Refresh Failure CascadeFallo en Cascada de Renovacion de Tokens

Impact: HImpacto: A

Mitigation: If marketplace OAuth endpoint goes down, all tokens expire within 6h (MeLi). TokenManager implements exponential backoff and circuit breaker. After 3 consecutive failures per credential, mark expired and notify user. MeLi gotcha: only the last refresh_token is valid — serialize refresh calls with mutex to prevent concurrent invalidation.Mitigacion: Si el endpoint OAuth del marketplace cae, todos los tokens expiran en 6h (MeLi). TokenManager implementa backoff exponencial y circuit breaker. Despues de 3 fallos consecutivos por credencial, marcar como expirado y notificar usuario. Gotcha de MeLi: solo el ultimo refresh_token es valido — serializar llamadas de refresh con mutex para prevenir invalidacion concurrente.

Marketplace API Breaking ChangesCambios Incompatibles en APIs de Marketplaces

Impact: MImpacto: M

Mitigation: Each adapter is isolated. #16 Eval Suite (api_monitor pipeline) checks changelogs every 24h + canary tests run daily against real APIs — creates Linear issue tagged api-change when a breaking change is detected. Breaking change in MeLi only affects MeLiAdapter. Schema validation on responses catches unexpected fields.Mitigacion: Cada adaptador esta aislado. #16 Eval Suite (pipeline api_monitor) chequea changelogs cada 24h + canary tests corren diariamente contra APIs reales — crea issue en Linear con tag api-change cuando detecta un cambio incompatible. Cambio incompatible en MeLi solo afecta MeLiAdapter. Validacion de schema en respuestas detecta campos inesperados.

Developer Account Approval DelaysRetrasos en Aprobacion de Cuentas de Desarrollador

Impact: HImpacto: A

Mitigation: Amazon SP-API can take 2-4 weeks. Start all applications at Week 0. El Capitan tracks weekly. If blocked, prioritize MeLi (1-2 weeks) as first functional adapter.Mitigacion: Amazon SP-API puede tomar 2-4 semanas. Iniciar todos los tramites en Week 0. El Capitan da seguimiento semanal. Si bloquea, priorizar MeLi (1-2 semanas) como primer adapter funcional.

Credential Security BreachBrecha de Seguridad de Credenciales

Impact: HImpacto: A

Mitigation: Tokens encrypted at rest (AES-256-GCM). Encryption key in AWS Secrets Manager (not env var). DynamoDB access restricted via IAM policy. No token ever appears in logs or traces. Post-MVP: AWS KMS with key rotation.Mitigacion: Tokens encriptados at rest (AES-256-GCM). Key de encriptacion en AWS Secrets Manager (no variable de entorno). Acceso a DynamoDB restringido via politica IAM. Ningun token aparece jamas en logs o trazas. Post-MVP: AWS KMS con rotacion de keys.

Key DecisionsDecisiones Clave

D1.

SKU as primary identifier over Publication ID — The same product can have multiple Publication IDs across marketplaces. IDs are assigned by the marketplace, not by the seller. Using SKU lets the agent operate without knowing marketplace-internal IDs. Each adapter resolves SKU → productId internally.SKU como identificador primario sobre Publication ID — El mismo producto puede tener multiples Publication IDs en diferentes marketplaces. Los IDs son asignados por el marketplace, no por el vendedor. Usar SKU permite al agente operar sin conocer IDs internos del marketplace. Cada adaptador resuelve SKU → productId internamente.

D2.

Strategy Pattern over Factory — Each marketplace is a pluggable adapter implementing IMarketplaceAdapter. Adding a new marketplace means adding one class, zero changes to existing code.Patron Strategy sobre Factory — Cada marketplace es un adaptador pluggable que implementa IMarketplaceAdapter. Agregar un nuevo marketplace significa agregar una clase, cero cambios al codigo existente.

D3.

No auth_token in MarketplaceRequest — The adapter resolves tokens internally via ITokenManager. Callers (Orchestrator, tools) only provide userId + marketplaceSlug. This eliminates token leakage risk and simplifies the public interface. Formerly #10 handled this externally; now it is an internal concern.Sin auth_token en MarketplaceRequest — El adaptador resuelve tokens internamente via ITokenManager. Los callers (Orquestador, tools) solo proveen userId + marketplaceSlug. Esto elimina el riesgo de fuga de tokens y simplifica la interfaz publica. Anteriormente #10 manejaba esto externamente; ahora es un asunto interno.

D4.

Write-first, reads only for rollback — This module is the execution layer (Create/Modify/Delete). Reads are limited to snapshotProduct() for pre-transaction state capture. The primary data read path lives in Data Sync (#10).Escritura primero, lectura solo para rollback — Este modulo es la capa de ejecucion (Crear/Modificar/Borrar). Las lecturas se limitan a snapshotProduct() para capturar estado pre-transaccion. La ruta principal de lectura de datos vive en Data Sync (#10).

D5.

DynamoDB for tokens, AWS Secrets Manager for static secrets — Secrets Manager is for values that change rarely (client_id, client_secret, encryption keys). DynamoDB handles high-frequency token reads/writes with conditional updates and TTL. This separation matches the access pattern of each secret type.DynamoDB para tokens, AWS Secrets Manager para secretos estaticos — Secrets Manager es para valores que cambian raramente (client_id, client_secret, keys de encriptacion). DynamoDB maneja lecturas/escrituras de tokens de alta frecuencia con updates condicionales y TTL. Esta separacion coincide con el patron de acceso de cada tipo de secreto.

D6.

Proactive refresh (30min before expiry) over on-demand — MeLi tokens expire every 6h. Refreshing only when a request needs a token creates latency spikes and race conditions. The 5-minute cron proactively refreshes any token expiring within 30 minutes, ensuring getToken() almost always hits a warm cache (<50ms).Renovacion proactiva (30min antes de expirar) sobre bajo demanda — Tokens de MeLi expiran cada 6h. Renovar solo cuando un request necesita un token crea picos de latencia y condiciones de carrera. El cron de 5 minutos renueva proactivamente cualquier token que expire en 30 minutos, asegurando que getToken() casi siempre tenga cache caliente (<50ms).

D7.

All 3 marketplaces from Phase 1 — MeLi + Amazon + Shopify ship in MVP. No phased rollout per marketplace. The IMarketplaceAdapter interface ensures each adapter is isolated; one can be developed/tested independently of the others.Los 3 marketplaces desde Fase 1 — MeLi + Amazon + Shopify se lanzan en MVP. Sin rollout por fases por marketplace. La interfaz IMarketplaceAdapter asegura que cada adaptador esta aislado; uno puede desarrollarse/testearse independientemente de los otros.

D8.

TypeScript/Node.js over Python — Consistency with core-intelligence services (TypeScript). Amazon @sp-api-sdk is the most active TS SDK. MeLi has no SDK in any language (raw HTTP regardless). Shopify official SDK is JS-first. Module is I/O-bound; marketplace API latency (100-500ms) dominates, not runtime. Strong typing via TypeScript catches adapter interface violations at compile time.TypeScript/Node.js sobre Python — Consistencia con servicios core-intelligence (TypeScript). Amazon @sp-api-sdk es el SDK TS mas activo. MeLi no tiene SDK en ningun lenguaje (HTTP directo sin importar). Shopify SDK oficial es JS-first. El modulo es I/O-bound; la latencia de APIs de marketplace (100-500ms) domina, no el runtime. Tipado fuerte via TypeScript detecta violaciones de interfaz de adaptador en tiempo de compilacion.

MVP Scope

[v4] MeLi REST + Amazon SP-API (LWA) + Shopify GraphQL. 17 write tools across 4 domains + 3 enrollment methods. SKU as primary key. TypeScript/Node.js. Internal TokenManager with DynamoDB + AWS Secrets Manager + AES-256-GCM. OAuth2 for all 3 marketplaces. Token refresh cron 5min. Onboarding flow. Absorbs former #10. [v4] MeLi REST + Amazon SP-API (LWA) + Shopify GraphQL. 17 write tools en 4 dominios + 3 metodos de enrolamiento. SKU como primary key. TypeScript/Node.js. TokenManager interno con DynamoDB + AWS Secrets Manager + AES-256-GCM. OAuth2 para los 3 marketplaces. Cron token refresh 5min. Flujo de onboarding. Absorbe al antiguo #10.

Inspired byInspirado en

Existing data orchestrator connectors. OAuth2 rotation from Data Orchestrator. Conectores existentes del orquestador de datos. Rotacion OAuth2 del Orquestador de Datos.

Source:Fuente: Existing data orchestrator connectors + OAuth2 rotationConectores existentes del orquestador de datos + rotacion OAuth2 | Depends on:Depende de: #10 (Data Sync), #14 (DevOps — IaC)
📝 Project ChangelogChangelog del Proyecto
v4 Mar 2, 2026
!Absorbs #10 Auth & Credentials Vault — token management is now an internal moduleAbsorbe #10 Auth & Credentials Vault — gestion de tokens es ahora un modulo interno
~Renamed subtitle “Tools” → “Tools & Auth”Subtitulo renombrado “Tools” → “Tools & Auth”
~MarketplaceRequest changed from {auth_token, username, marketplace_slug} → {userId, marketplaceSlug} — adapter resolves tokens internallyMarketplaceRequest cambiado de {auth_token, username, marketplace_slug} → {userId, marketplaceSlug} — adaptador resuelve tokens internamente
~Stack migrated Python 3.12+ → TypeScript/Node.js 22+Stack migrado de Python 3.12+ → TypeScript/Node.js 22+
+ITokenManager: getToken(), storeCredentials(), revokeCredentials(), getConnectedMarketplaces(), forceRefresh()ITokenManager: getToken(), storeCredentials(), revokeCredentials(), getConnectedMarketplaces(), forceRefresh()
+3 new components (ex-#10): TokenManager, OAuth2 Flow Manager, Onboarding Flow3 componentes nuevos (ex-#10): TokenManager, OAuth2 Flow Manager, Onboarding Flow
+Credentials schema: DynamoDB (AES-256-GCM) + AWS Secrets ManagerSchema de credenciales: DynamoDB (AES-256-GCM) + AWS Secrets Manager
+Token refresh cron: EventBridge every 5min, proactive 30min before expiryCron de tokens: EventBridge cada 5min, proactivo 30min antes de expirar
~Implementation 4 phases → 5 phases (Phase 1: TokenManager+OAuth2 ex-#10)Implementacion 4 fases → 5 fases (Fase 1: TokenManager+OAuth2 ex-#10)
~Risks 4 → 5 (new: Token Refresh Failure Cascade, Credential Security Breach)Riesgos 4 → 5 (nuevos: Cascada de Fallas Token Refresh, Brecha de Seguridad Credenciales)
~Acceptance criteria 10 → 15 (merged #12 + #10)Criterios de aceptación 10 → 15 (merged #12 + #10)
~Dependencies: #10, #10 → #10, #14Dependencias: #10, #10 → #10, #14
~listing → product rename across all interfaces and toolsRenombrado listing → product en todas las interfaces y tools
v3 Feb 27-28, 2026
+Full deep spec card: description, component grid, tech stack, data models, API contracts, acceptance criteria, implementation plan, risks, key decisionsCard deep spec completa: descripcion, grid de componentes, tech stack, modelos de datos, contratos API, criterios de aceptación, plan de implementacion, riesgos, decisiones clave
+17 write tools across 4 domains (Catalog, Engagement, Advertising, Enrollment)17 write tools en 4 dominios (Catalogo, Engagement, Advertising, Enrolamiento)
+SKU as primary identifier — adapter resolves to marketplace-native IDSKU como identificador primario — adaptador resuelve a ID nativo del marketplace
+Support matrix (3 marketplaces × 4 domains)Matriz de soporte (3 marketplaces × 4 dominios)
v2.1 Feb 27, 2026
+Shopify elevated to MVP — ShopifyAdapter (GraphQL) no longer deferredShopify elevado a MVP — ShopifyAdapter (GraphQL) ya no diferido
~MVP marketplaces: 2 (MeLi + Amazon) → 3 (+ Shopify)Marketplaces MVP: 2 (MeLi + Amazon) → 3 (+ Shopify)
v2 Feb 27, 2026
+search_competitors defined: MeLi Search API (/sites/MLA/search)search_competitors definido: MeLi Search API (/sites/MLA/search)
~Cache: in-memory → Redis (Cloud Memorystore) from Day 1Cache: en memoria → Redis (Cloud Memorystore) desde Dia 1
v1 Feb 26, 2026
+Initial project — MeLi + Amazon adapters, ADAPT statusProyecto inicial — Adaptadores MeLi + Amazon, estado ADAPT

Layer 5 — PLATFORMCapa 5 — PLATAFORMA

What sustains the business and infrastructureLo que sostiene el negocio e infraestructura

+
#13

Billing & Credit Economy

core-platform-billing — Sergio

REWRITE

The economics engine of Shopilot — unifies metering and billing in a single project (core-platform-billing). Credit tracking already works in production via PostgreSQL triggers on agent_costs — the application never calculates credits, only inserts costs and the triggers handle deduction from clients.credits. This project extends that foundation with plan management (Free/Pro), Stripe integration for payments, Credit Packs, and the ICreditsGate contract that the Orchestrator (#2) calls before every tool execution. The Orchestrator receives allowed: boolean — it never knows about plans, Stripe, or pricing rules. Prompt caching (Anthropic cache_control) reduces LLM input token costs by 60-80% on layers 1-3 of the SystemPromptComposer. Absorbs former #14 (Billing & Subscription Management). El motor economico de Shopilot — unifica metering y billing en un solo proyecto (core-platform-billing). El tracking de creditos ya funciona en produccion via triggers de PostgreSQL sobre agent_costs — la aplicacion nunca calcula creditos, solo inserta costos y los triggers manejan la deduccion desde clients.credits. Este proyecto extiende esa base con gestion de planes (Free/Pro), integracion con Stripe para pagos, Credit Packs, y el contrato ICreditsGate que el Orquestador (#2) llama antes de cada ejecucion de tool. El Orquestador recibe allowed: boolean — nunca sabe de planes, Stripe, ni reglas de pricing. Prompt caching (Anthropic cache_control) reduce costos de tokens LLM de entrada 60-80% en las capas 1-3 del SystemPromptComposer. Absorbe el anterior #14 (Billing & Subscription Management).

Beautonomous governance: ICreditsGate enforces resource limits per Core's permission matrix — the Orchestrator cannot execute any tool if the seller's credit budget is exhausted, regardless of role. Free plan limits align with Core's tier-based access rules.Governance de Beautonomous: ICreditsGate aplica los límites de recursos según la matriz de permisos de Core — el Orquestador no puede ejecutar ninguna tool si el presupuesto de créditos del vendedor está agotado, independientemente del rol. Los límites del plan Free se alinean con las reglas de acceso por tier de Core.

What this project does NOT doLo que este proyecto NO hace

Marketplace API caching — Caching MeLi/Amazon/Shopify API responses is Data Sync (#10). This project only caches LLM tokens via prompt caching.Cache de APIs de marketplace — Cachear respuestas de MeLi/Amazon/Shopify es Data Sync (#10). Este proyecto solo cachea tokens LLM via prompt caching.
Calculate credits per execution — PostgreSQL triggers on agent_costs already do this. This project does not duplicate that logic.Calcular creditos por ejecucion — Los triggers de PostgreSQL sobre agent_costs ya lo hacen. Este proyecto no duplica esa logica.
Purchase prompts in chat — The Coach never interrupts a conversation with "buy more credits". Alerts are data that Shell (#1) consumes via GET /billing/status.Prompts de compra en el chat — El Coach nunca interrumpe una conversacion con "compra mas creditos". Las alertas son datos que la Shell (#1) consume via GET /billing/status.
Admin dashboard / business metrics — MRR, ARPU, churn, LTV are in Stripe Dashboard. No admin endpoints in MVP.Dashboard admin / metricas de negocio — MRR, ARPU, churn, LTV estan en Stripe Dashboard. Sin endpoints admin en MVP.
LLM model routing — Which model to use (Haiku/Sonnet/Opus) is decided by LLMClientFactory in the Orchestrator (#2), not billing.Routing de modelos LLM — Que modelo usar (Haiku/Sonnet/Opus) lo decide el LLMClientFactory en el Orquestador (#2), no billing.
Business plan ($149/mo) — Deferred to Phase 2. Not in MVP scope.Plan Business ($149/mes) — Diferido a Fase 2. No esta en el scope del MVP.
CreditGate
ICreditsGate — allowed: boolean before each toolICreditsGate — allowed: boolean antes de cada tool
StripeCheckoutService
Upgrade to Pro + Credit Pack purchasesUpgrade a Pro + compra de Credit Packs
StripeWebhookHandler
5 events, idempotent via stripe_webhook_events5 eventos, idempotente via stripe_webhook_events
SubscriptionLifecycle
FREE → PRO → PAST_DUE → GRACE → FREEFREE → PRO → PAST_DUE → GRACE → FREE
BillingStatusService
GET /billing/status <100ms for Shell (#1)GET /billing/status <100ms para Shell (#1)
PromptCacheOptimizer
cache_control on layers 1-3 — 60-80% savingscache_control en capas 1-3 — 60-80% ahorro

Plans & Credit PacksPlanes & Credit Packs

MVP PlansPlanes MVP

FreePro
PricePrecio$0$49/mo
Credits/moCreditos/mes50500
Tools READ
Tools ANALYSIS
Tools WRITE
ProactivityProactividad
Credit Packs
At 0 creditsA 0 creditosHARD BLOCKHARD BLOCKSOFT BLOCKSOFT BLOCK

Credit Packs (Pro only)Credit Packs (solo Pro)

PackCreditsCreditosPricePrecio
Basic100$5.00
Popular500$20.00
Power1,000$35.00

Pack credits expire 12 months from purchase. Plan credits reset monthly. Deduction order: plan first, then packs.Creditos de pack expiran 12 meses desde la compra. Creditos de plan se resetean mensualmente. Orden de deduccion: plan primero, luego packs.

Blocking LogicLogica de Bloqueo

Free, credits = 0 → HARD BLOCK — all tools blocked
Pro,  credits = 0 → SOFT BLOCK — writes blocked, reads+analysis continue
Pro,  credits > 0 → everything enabled
Free, credits > 0 → reads+analysis enabled, writes always blocked

Tech StackStack Tecnologico

TypeScript / Node.js PostgreSQL (existing) Stripe SDK (Node.js) AWS Lambda AWS CDK API Gateway HTTP v2 SSM Parameter Store Anthropic cache_control
Schema, Contracts, Endpoints & Acceptance Criteria Schema, Contratos, Endpoints & Criterios de Aceptación
Existing PostgreSQL Schema (operational — do not rewrite)Schema PostgreSQL Existente (operacional — no reescribir)
-- Hierarchy already in production:
-- clients → agents_clients → agent_executions → agent_costs (FK charge_types)

-- charge_types (configurable pricing):
-- TOKENS:        CEIL((input+output)/1000) * 1 credit
-- EMBEDDING:     1 credit flat
-- VECTOR_SEARCH: 1 credit flat
-- BRAND_HEALTH:  1 credit flat
-- EXTERNAL_COST: CEIL(cost_usd/0.01) * 1 credit

-- Triggers (already working):
-- trg_calculate_agent_credits  BEFORE INSERT on agent_costs → calculates credits from charge_type
-- trg_apply_credits_to_client  AFTER INSERT on agent_costs  → decrements clients.credits
-- after_insert_agent_cost_sync AFTER INSERT on agent_costs  → accumulates agents_clients.credits_used

-- The app NEVER calculates credits. It inserts into agent_costs with correct charge_type_id.
-- Triggers do the rest.
Schema Extensions for BillingExtensiones de Schema para Billing
-- Extend clients with subscription fields:
ALTER TABLE clients
  ADD COLUMN plan                 VARCHAR(20)    DEFAULT 'free',    -- free | pro
  ADD COLUMN stripe_customer_id   VARCHAR(100),                     -- cus_xxxxx
  ADD COLUMN stripe_subscription_id VARCHAR(100),                   -- sub_xxxxx
  ADD COLUMN subscription_status  VARCHAR(30)    DEFAULT 'active',  -- active | past_due | grace_period | canceled
  ADD COLUMN billing_period_start DATE,
  ADD COLUMN billing_period_end   DATE,
  ADD COLUMN credits_from_plan    NUMERIC(10,2)  DEFAULT 50,        -- resets monthly
  ADD COLUMN credits_from_packs   NUMERIC(10,2)  DEFAULT 0;         -- 12-month expiry

-- credits = credits_from_plan + credits_from_packs
-- Trigger deduction order: plan first, then packs:
-- UPDATE clients SET
--   credits_from_plan  = GREATEST(0, credits_from_plan  - deduction),
--   credits_from_packs = GREATEST(0, credits_from_packs - GREATEST(0, deduction - credits_from_plan)),
--   credits            = credits - deduction
-- WHERE client_id = $1;

-- New table: stripe_webhook_events (idempotency)
CREATE TABLE stripe_webhook_events (
    stripe_event_id  VARCHAR(100) PRIMARY KEY,   -- evt_xxxxx
    event_type       VARCHAR(100) NOT NULL,
    processed_at     TIMESTAMP    DEFAULT now(),
    payload          JSONB,
    status           VARCHAR(20)  DEFAULT 'processed'  -- processed | failed
);

-- New table: credit_pack_purchases
CREATE TABLE credit_pack_purchases (
    purchase_id            UUID         PRIMARY KEY DEFAULT gen_random_uuid(),
    client_id              VARCHAR(50)  NOT NULL REFERENCES clients(client_id),
    stripe_payment_intent  VARCHAR(100) NOT NULL,
    pack_type              VARCHAR(20)  NOT NULL,   -- basic | popular | power
    credits_added          NUMERIC(10,2) NOT NULL,  -- 100 | 500 | 1000
    amount_usd             NUMERIC(8,2)  NOT NULL,  -- 5.00 | 20.00 | 35.00
    purchased_at           TIMESTAMP    DEFAULT now(),
    expires_at             TIMESTAMP                 -- purchased_at + 12 months
);
ICreditsGate Contract (integration with Orchestrator #2)Contrato ICreditsGate (integracion con Orquestador #2)
// --- In core-intelligence-conversation-api (domain layer) ---

// domain/ports/ICreditsGate.ts
interface ICreditsGate {
  canProceed(params: {
    userId: string
    toolCategory: 'read' | 'analysis' | 'write' | 'system'
  }): Promise<CreditGateResult>
}

interface CreditGateResult {
  allowed: boolean
  reason?: 'insufficient_credits' | 'writes_blocked_no_credits' | 'plan_restriction'
  creditsRemaining: number
  plan: 'free' | 'pro'
}

// infrastructure/billing/HttpCreditGate.ts
class HttpCreditGate implements ICreditsGate {
  // Calls POST /internal/gate on core-platform-billing
  // Auth: x-internal-key header (SSM SecureString, rotated)
  // Fail-open: if billing unavailable, returns allowed: true
  //   (triggers still deduct credits independently)
}

// --- In core-platform-billing ---

// POST /internal/gate handler (not public, internal API key required)
async function creditGateHandler(req): Promise<CreditGateResult> {
  const { userId, toolCategory } = req.body
  const { credits, plan } = await db.queryOne(
    'SELECT credits, plan FROM clients WHERE client_id = $1', [userId]
  )
  if (plan === 'free' && credits <= 0)
    return { allowed: false, reason: 'insufficient_credits', creditsRemaining: 0, plan }
  if (plan === 'pro' && credits <= 0 && toolCategory === 'write')
    return { allowed: false, reason: 'writes_blocked_no_credits', creditsRemaining: 0, plan }
  return { allowed: true, creditsRemaining: credits, plan }
}
REST Endpoints & BillingStatusEndpoints REST & BillingStatus
// Public endpoints (authenticated user):
// POST /billing/checkout          → Stripe Checkout Session (mode=subscription) for Pro upgrade
// POST /billing/packs/checkout    → Stripe Checkout Session (mode=payment) for Credit Pack (Pro only)
// POST /billing/portal            → Stripe Customer Portal redirect (upgrade/cancel/invoices)
// GET  /billing/status            → BillingStatus (<100ms, direct read from clients)

// Internal endpoint (service-to-service):
// POST /internal/gate             → CreditGateResult (x-internal-key auth)

// Webhook endpoint (Stripe signature verification):
// POST /billing/webhook           → Handles 5 Stripe events with idempotency

interface BillingStatus {
  plan: 'free' | 'pro'
  subscriptionStatus: 'active' | 'past_due' | 'grace_period' | 'canceled'
  creditsRemaining: number          // = credits_from_plan + credits_from_packs
  creditsFromPlan: number
  creditsFromPacks: number
  billingPeriodEnd: Date | null
  percentUsed: number               // for UI alerts in Shell (#1)
}
Stripe Webhooks (5 events)Webhooks Stripe (5 eventos)
// All webhooks: INSERT into stripe_webhook_events first.
// If PK violation → already processed → return 200 (idempotent).

// invoice.payment_succeeded  → renew subscription + reset credits_from_plan
// invoice.payment_failed     → subscription_status = 'past_due' (UI in #1 notifies)
// customer.subscription.deleted → start 7-day grace period, then downgrade to Free
// customer.subscription.updated → sync plan, price, billing dates
// checkout.session.completed (mode=payment) → credit pack: INSERT credit_pack_purchases + add credits
Prompt Caching (SystemPromptComposer in #2)Prompt Caching (SystemPromptComposer en #2)
// System prompt layers with cache_control: { type: "ephemeral" }
// Layer 1: Personality base     ~1200 tokens  cache hit ~95%
// Layer 2: Marketplace context  ~400  tokens  cache hit ~70%
// Layer 3: Tool definitions     ~800  tokens  cache hit ~90%
// Layer 4: User profile         ~200  tokens  NOT cached (dynamic)
//
// Reduction: 60-80% input token cost on layers 1-3.
// 5+ turn conversations see cumulative savings.
// Implementation lives in SystemPromptComposer (core-intelligence-conversation-api)
// Billing documents it because it directly impacts plan operating cost.
Acceptance CriteriaCriterios de Aceptación
  • ICreditsGate.canProceed() returns correct allowed/reason for all 4 blocking scenarios (Free 0cr, Pro 0cr write, Pro 0cr read, Free >0cr write)
  • GET /billing/status returns BillingStatus in <100ms (direct PK read from clients)
  • Stripe webhooks are idempotent: re-delivered events do not duplicate credits or re-trigger state changes
  • invoice.payment_succeeded resets credits_from_plan to plan quota (50 or 500) and preserves credits_from_packs
  • Credit Pack checkout adds credits_from_packs without affecting credits_from_plan
  • Subscription state machine: FREE → PRO → PAST_DUE → GRACE_PERIOD → FREE works end-to-end
  • Grace period: 7 days post-cancellation, Pro features remain active, then downgrade to Free with plan credits lost and pack credits preserved
  • PostgreSQL trigger deduction order: plan credits consumed before pack credits
  • Race condition: concurrent tool calls with 1 credit remaining — only one passes (WHERE credits >= deduction)
  • Fail-open: if billing service is unavailable, HttpCreditGate returns allowed: true (triggers still deduct independently)
  • Prompt caching: layers 1-3 of SystemPromptComposer use cache_control headers, achieving 60-80% input token cost reduction
  • ICreditsGate.canProceed() retorna allowed/reason correcto para los 4 escenarios de bloqueo (Free 0cr, Pro 0cr write, Pro 0cr read, Free >0cr write)
  • GET /billing/status retorna BillingStatus en <100ms (lectura directa por PK de clients)
  • Webhooks de Stripe son idempotentes: eventos re-entregados no duplican creditos ni re-disparan cambios de estado
  • invoice.payment_succeeded resetea credits_from_plan a la cuota del plan (50 o 500) y preserva credits_from_packs
  • Checkout de Credit Pack suma a credits_from_packs sin afectar credits_from_plan
  • Maquina de estados: FREE → PRO → PAST_DUE → GRACE_PERIOD → FREE funciona end-to-end
  • Periodo de gracia: 7 dias post-cancelacion, features Pro siguen activas, luego downgrade a Free con creditos de plan perdidos y creditos de pack preservados
  • Orden de deduccion del trigger PostgreSQL: creditos de plan se consumen antes que creditos de pack
  • Race condition: tool calls concurrentes con 1 credito restante — solo una pasa (WHERE credits >= deduction)
  • Fail-open: si el servicio de billing no esta disponible, HttpCreditGate retorna allowed: true (triggers siguen deduciendo independientemente)
  • Prompt caching: capas 1-3 del SystemPromptComposer usan headers cache_control, logrando 60-80% reduccion en costo de tokens de entrada

How It WorksComo Funciona

ReActOrchestrator (#2)
    |
    | (before each tool call)
    v
ICreditsGate.canProceed({ userId, toolCategory })
    |
    |— allowed: true  → execute tool normally
    |
    +— allowed: false → append tool_result("No credits for this operation")
                          → loop continues, LLM explains to user

Credit deduction (independent path):
    tool executes → INSERT into agent_costs
                   → trg_calculate_agent_credits (BEFORE INSERT)
                   → trg_apply_credits_to_client (AFTER INSERT)
                   → clients.credits decremented automatically

Subscription lifecycle:
    SIGNUP → FREE (50cr/mo, internal reset)
               |
               | upgrade (Stripe Checkout)
               v
           PRO (500cr/mo, Stripe invoice reset) —————+
               |                                       | buy pack
               | payment_failed                        v
               v                               credit_pack_purchases
           PAST_DUE (3 Stripe retries)         + credits_from_packs
               |
               | all retries fail
               v
           GRACE_PERIOD (7 days, Pro still active)
               |
               | expires
               v
           FREE (downgrade, plan credits lost, pack credits preserved)

The Orchestrator calls ICreditsGate.canProceed() before each tool — it receives allowed: boolean and never knows about plans, Stripe, or pricing rules. Credit deduction happens independently via PostgreSQL triggers on agent_costs — the app only inserts costs, triggers handle the math. Stripe webhooks synchronize subscription state (renewals, failures, cancellations) into PostgreSQL. The billing service is the single source of truth for plan/credit state, while Stripe is the source of truth for payments.El Orquestador llama ICreditsGate.canProceed() antes de cada tool — recibe allowed: boolean y nunca sabe de planes, Stripe, ni reglas de pricing. La deduccion de creditos ocurre independientemente via triggers de PostgreSQL sobre agent_costs — la app solo inserta costos, los triggers manejan la matematica. Los webhooks de Stripe sincronizan el estado de suscripcion (renovaciones, fallos, cancelaciones) en PostgreSQL. El servicio de billing es la unica fuente de verdad para estado de plan/creditos, mientras Stripe es la fuente de verdad para pagos.

Implementation PlanPlan de Implementacion

Phase 1: Credit Gate + Schema (Orchestrator prerequisite)Fase 1: Credit Gate + Schema (prerequisito del Orquestador)

In core-platform-billing: extend clients with subscription columns (plan, stripe_customer_id, subscription_status, credits_from_plan, credits_from_packs, billing_period_*). Create stripe_webhook_events and credit_pack_purchases tables. Update deduction trigger for plan-vs-pack ordering. Build POST /internal/gate endpoint with Free/Pro decision logic. In core-intelligence-conversation-api: define ICreditsGate interface (domain) + HttpCreditGate implementation (HTTP call to billing service). All users start as Free with 50 credits/month.En core-platform-billing: extender clients con columnas de suscripcion (plan, stripe_customer_id, subscription_status, credits_from_plan, credits_from_packs, billing_period_*). Crear tablas stripe_webhook_events y credit_pack_purchases. Actualizar trigger de deduccion para orden plan-vs-pack. Construir endpoint POST /internal/gate con logica de decision Free/Pro. En core-intelligence-conversation-api: definir interfaz ICreditsGate (dominio) + implementacion HttpCreditGate (llamada HTTP al servicio de billing). Todos los usuarios arrancan como Free con 50 creditos/mes.

Phase 2: Stripe Checkout + WebhooksFase 2: Stripe Checkout + Webhooks

Checkout flow for upgrade to Pro (Stripe-hosted page — full PCI compliance, no card data touches our servers). Webhooks: invoice.payment_succeeded, payment_failed, subscription.deleted, subscription.updated. Customer Portal for self-service (upgrades, cancellations, invoices). Cron for monthly reset of Free users (check billing_period_end < now() for users without Stripe subscription).Flujo de checkout para upgrade a Pro (pagina hosted por Stripe — compliance PCI total, ningun dato de tarjeta toca nuestros servidores). Webhooks: invoice.payment_succeeded, payment_failed, subscription.deleted, subscription.updated. Customer Portal para autoservicio (upgrades, cancelaciones, facturas). Cron para reset mensual de usuarios Free (verificar billing_period_end < now() para usuarios sin suscripcion Stripe).

Phase 3: Credit Packs + Prompt CachingFase 3: Credit Packs + Prompt Caching

Credit Pack checkout (3 options, Pro only) via Stripe mode=payment. GET /billing/status endpoint for Shell (#1) with BillingStatus response. Prompt caching with cache_control: { type: "ephemeral" } on layers 1-3 of SystemPromptComposer. Quota alerts visible to frontend via billing status (percentUsed field).Checkout de Credit Packs (3 opciones, solo Pro) via Stripe mode=payment. Endpoint GET /billing/status para Shell (#1) con respuesta BillingStatus. Prompt caching con cache_control: { type: "ephemeral" } en capas 1-3 del SystemPromptComposer. Alertas de cuota visibles para el frontend via billing status (campo percentUsed).

Risk AnalysisAnalisis de Riesgos

Stripe ↔ PostgreSQL DriftDesfase Stripe ↔ PostgreSQL

Impact: HighImpacto: Alto

Mitigation: Stripe retries webhooks up to 72h with exponential backoff. Nightly reconciliation cron compares subscription_status in Stripe vs PostgreSQL. If they diverge, Stripe wins (source of truth for payments).Mitigacion: Stripe reintenta webhooks hasta 72h con backoff exponencial. Cron de reconciliacion nocturno compara subscription_status en Stripe vs PostgreSQL. Si divergen, gana Stripe (fuente de verdad de pagos).

Double Credit Deduction (race condition)Doble Deduccion de Creditos (race condition)

Impact: HighImpacto: Alto

Mitigation: PostgreSQL UPDATE with WHERE credits >= deduction prevents overdrawing. If the condition fails, trigger returns error. Gate implementation handles that error as allowed: false.Mitigacion: UPDATE de PostgreSQL con WHERE credits >= deduction previene sobregirar. Si la condicion falla, el trigger retorna error. La implementacion del gate maneja ese error como allowed: false.

Free Plan Abuse (multi-account)Abuso del Plan Free (multi-cuenta)

Impact: MediumImpacto: Medio

Mitigation: 50 credits/month is enough to evaluate but not to operate a business. Account creation requires Memberstack verification. Multi-account abuse patterns detected by IP/email in Memberstack.Mitigacion: 50 creditos/mes es suficiente para evaluar pero no para operar un negocio. La creacion de cuenta requiere verificacion via Memberstack. Patrones de abuso multi-cuenta detectados por IP/email en Memberstack.

Stripe Failure During CheckoutFalla de Stripe Durante Checkout

Impact: MediumImpacto: Medio

Mitigation: Checkout is non-critical (Coach works without it). Circuit breaker on checkout endpoint. Stripe has 99.99% historical uptime.Mitigacion: El checkout es una ruta no critica (el Coach funciona sin ella). Circuit breaker en el endpoint de checkout. Stripe tiene 99.99% de uptime historico.

Key DecisionsDecisiones Clave

D1.

Unify metering and billing in one project (#13 + #14) — The original separation created circular dependencies: billing needs to know credit consumption, metering needs to know plan limits. Unified in core-platform-billing, the source of truth is a single project.Unificar metering y billing en un proyecto (#13 + #14) — La separacion original creaba dependencias circulares: billing necesita saber el consumo de creditos, metering necesita saber los limites de cada plan. Unificados en core-platform-billing, la fuente de verdad es un solo proyecto.

D2.

Stripe Checkout + Customer Portal instead of custom payment UI — Full PCI compliance handled by Stripe. No card data touches our servers. Customer Portal provides upgrade, downgrade, invoice history, and cancellation without building those screens (that's Shell #1's job).Stripe Checkout + Customer Portal en vez de UI de pago propia — Compliance PCI manejado completamente por Stripe. Ningun dato de tarjeta toca nuestros servidores. El Customer Portal provee upgrade, downgrade, historial de facturas y cancelacion sin construir esas pantallas (eso es trabajo de Shell #1).

D3.

Credits as user abstraction, not raw tokens — Sellers understand "this action costs 3 credits", not "3,247 input tokens". Credits allow changing underlying LLM costs without altering the user-facing pricing experience.Creditos como abstraccion de usuario, no tokens crudos — Los vendedores entienden "esta accion cuesta 3 creditos", no "3,247 tokens de entrada". Los creditos permiten cambiar costos subyacentes de LLM sin alterar la experiencia de precio del usuario.

D4.

Soft block (Pro) instead of hard block when credits run out — Blocking everything creates terrible UX. Pro can still query data and receive analysis — only marketplace mutations are blocked. Keeps the user in the product while showing the value of buying more credits.Soft block (Pro) en vez de hard block al agotar creditos — Bloquear todo crea una UX terrible. Pro puede seguir consultando datos y recibiendo analisis — solo las mutaciones de marketplace se bloquean. Mantiene al usuario en el producto mientras muestra el valor de comprar mas creditos.

D5.

Blocking logic lives in billing, not in the Orchestrator — If tomorrow we add a Business plan with different rules, the Orchestrator doesn't change. Only the ICreditsGate implementation changes. This boundary is intentional.La logica de bloqueo vive en billing, no en el Orquestador — Si manana anadimos un plan Business con reglas distintas, el Orquestador no cambia. Solo cambia la implementacion de ICreditsGate. Este boundary es intencional.

D6.

Existing PostgreSQL schema as source of truth, not replaced — The trigger system is already correct. The migration extends clients with subscription fields. We don't rewrite what works. No Redis, no extra cache for the credit gate — a PK lookup on clients is O(1) sub-millisecond.Schema PostgreSQL existente como fuente de verdad, no reemplazarlo — El sistema de triggers ya es correcto. La migracion extiende clients con campos de suscripcion. No reescribimos lo que funciona. Sin Redis, sin cache extra para el credit gate — un lookup por PK sobre clients es O(1) sub-milisegundo.

MVP Scope

Free ($0, 50cr/mo) + Pro ($49/mo, 500cr). Credit Packs (3 tiers, Pro only, 12-month expiry). ICreditsGate contract with Orchestrator (#2). Stripe Checkout + Customer Portal + 5 webhooks. Subscription state machine (FREE → PRO → PAST_DUE → GRACE → FREE). Prompt caching on layers 1-3 (60-80% input cost reduction). No admin dashboard (Stripe Dashboard). No Business plan (Phase 2). Free ($0, 50cr/mes) + Pro ($49/mes, 500cr). Credit Packs (3 niveles, solo Pro, expiran en 12 meses). Contrato ICreditsGate con Orquestador (#2). Stripe Checkout + Customer Portal + 5 webhooks. Maquina de estados de suscripcion (FREE → PRO → PAST_DUE → GRACE → FREE). Prompt caching en capas 1-3 (60-80% reduccion de costo de entrada). Sin dashboard admin (Stripe Dashboard). Sin plan Business (Fase 2).

Inspired byInspirado en

AgentTracking + Stripe (Sellerfy). Claude Code prompt caching patterns. Anthropic cache_control documentation. AgentTracking + Stripe (Sellerfy). Patrones de prompt caching de Claude Code. Documentación de cache_control de Anthropic.

Source:Fuente: AgentTracking + Stripe (Sellerfy), Anthropic cache_control | Depends on:Depende de: #17 (PostgreSQL schema), #2 (Orchestrator — ICreditsGate consumer), #1 (Shell — BillingStatus consumer)
📝 Project ChangelogChangelog del Proyecto
v4 Mar 2, 2026
!Absorbs #14 Billing & Subscription Management — circular dependencies resolvedAbsorbe #14 Billing & Subscription Management — dependencias circulares resueltas
~Renamed “Usage Metering, Caching & Billing” → “Billing & Credit Economy”Renombrado “Usage Metering, Caching & Billing” → “Billing & Credit Economy”
~Stack: Redis/DynamoDB/Pydantic/tiktoken → TypeScript/PostgreSQL/Stripe SDK/Lambda/CDKStack: Redis/DynamoDB/Pydantic/tiktoken → TypeScript/PostgreSQL/Stripe SDK/Lambda/CDK
~Components: 7 old → 6 new (CreditGate, StripeCheckoutService, StripeWebhookHandler, SubscriptionLifecycle, BillingStatusService, PromptCacheOptimizer)Componentes: 7 viejos → 6 nuevos (CreditGate, StripeCheckoutService, StripeWebhookHandler, SubscriptionLifecycle, BillingStatusService, PromptCacheOptimizer)
+ICreditsGate contract: canProceed() with fail-open invariantContrato ICreditsGate: canProceed() con invariante fail-open
+Plans & Credit Packs: Free $0/50cr, Pro $49/500cr, 3 packs ($5/$20/$35)Planes & Credit Packs: Free $0/50cr, Pro $49/500cr, 3 packs ($5/$20/$35)
+Subscription state machine: FREE → PRO → PAST_DUE → GRACE → FREEMaquina de estados de suscripcion: FREE → PRO → PAST_DUE → GRACE → FREE
+PostgreSQL schema documented: existing triggers, schema extensions, 5 Stripe webhooksSchema PostgreSQL documentado: triggers existentes, extensiones, 5 webhooks Stripe
~Implementation: 4 generic phases → 3 specific (Credit Gate, Stripe, Credit Packs)Implementacion: 4 fases genericas → 3 especificas (Credit Gate, Stripe, Credit Packs)
~Risks 3 → 4 (new: Stripe↔PostgreSQL drift, double deduction race condition)Riesgos 3 → 4 (nuevos: drift Stripe↔PostgreSQL, condicion de carrera doble deduccion)
~Acceptance criteria 6 → 11 (blocking scenarios, webhooks, state machine)Criterios de aceptación 6 → 11 (escenarios de bloqueo, webhooks, maquina de estados)
v3 Feb 27-28, 2026
+Full deep spec card as “Usage Metering, Caching & Billing”Card deep spec completa como “Usage Metering, Caching & Billing”
+TokenCostCalculator, CreditTransaction, ModelRouting, PromptCaching, APIResponseCachingTokenCostCalculator, CreditTransaction, ModelRouting, PromptCaching, APIResponseCaching
v2.1 Feb 27, 2026
+Re-incorporates UserQuotaManager with plan limits, alerts 80/100%, soft/hard blockingRe-incorpora UserQuotaManager con limites por plan, alertas 80/100%, bloqueo soft/hard
+Freemium model: Free ($0, 50cr) + Pro ($49, 500cr)Modelo freemium: Free ($0, 50cr) + Pro ($49, 500cr)
v2 Feb 27, 2026
Reduced to tracking only — UserQuotaManager, alerts, blocking all removed for MVPReducido a tracking — UserQuotaManager, alertas, bloqueo eliminados del MVP
Credit packs removed (no quotas = nothing to top up)Credit packs eliminados (sin quotas = nada que recargar)
v1 Feb 26, 2026
+Initial — “Usage Metering, Caching & Billing”, 3 plans ($49/$149/$299), credit packs, ADAPT statusInicial — “Usage Metering, Caching & Billing”, 3 planes ($49/$149/$299), credit packs, estado ADAPT
#14

DevOps (IaC)

Platform — Andrés

EXISTS NEW

Infrastructure as Code for all Shopilot cloud resources. Governing principle: data projects → GCP (Terraform) · backend / API / microservices → AWS (CloudFormation/CDK). Exceptions are explicit and project-specific (e.g. DynamoDB stays on AWS even for data it stores, because it is already defined as the backend for #12, #2, and other service projects). A GCP Terraform project already exists for Data Sync (#10) — Cloud Composer (Airflow), GCS buckets, Cloud Run (FastAPI Data API), BigQuery. This project formalizes and extends that foundation to cover all modules. AWS CloudFormation/CDK is new: DynamoDB tables, Lambda functions, API Gateway, Secrets Manager, SSM. Every infrastructure change goes through version-controlled IaC — no manual console provisioning. Infraestructura como Codigo para todos los recursos cloud de Shopilot. Principio rector: proyectos de datos → GCP (Terraform) · backend / API / microservicios → AWS (CloudFormation/CDK). Las excepciones son explicitas y especificas por proyecto (ej. DynamoDB permanece en AWS incluso para datos que almacena, porque ya esta definido como backend de #12, #2 y otros proyectos de servicio). Ya existe un proyecto Terraform de GCP para Data Sync (#10) — Cloud Composer (Airflow), buckets GCS, Cloud Run (FastAPI Data API), BigQuery. Este proyecto formaliza y extiende esa base para cubrir todos los modulos. AWS CloudFormation/CDK es nuevo: tablas DynamoDB, funciones Lambda, API Gateway, Secrets Manager, SSM. Cada cambio de infraestructura pasa por IaC versionado — sin aprovisionamiento manual en consola.

Beautonomous governance: GitHub Actions workflows for production deployments are subject to Core's role gates — only El Mago (Mateo) can approve and trigger production deployments. IaC authoring is El Artesano scope (Andres); production promotion is El Mago scope (Mateo). No manual console provisioning — every change is version-controlled and auditable.Governance de Beautonomous: los workflows de GitHub Actions para deploys a producción están sujetos a los gates de roles de Core — solo El Mago (Mateo) puede aprobar y lanzar deploys a producción. La autoría de IaC es ámbito de El Artesano (Andres); la promoción a producción es ámbito de El Mago (Mateo). Sin aprovisionamiento manual en consola — cada cambio es versionado y auditable.

Terraform (GCP)
Data projects — #10, #9, #11, Open MetadataProyectos de datos — #10, #9, #11, Open Metadata
CloudFormation / CDK (AWS)
Backend/API/microservices — #12, #8, #13, #2, #3, #15, #4–#6, #7Backend/API/microservicios — #12, #8, #13, #2, #3, #15, #4–#6, #7
CI/CD Pipelines
GitHub Actions: plan/applyGitHub Actions: plan/apply
3 Environments
dev / staging / prod
State ManagementGestion de Estado
GCS backend (TF) + S3 (CF)Backend GCS (TF) + S3 (CF)
ExtensibleExtensible
Grows with each project deployCrece con cada deploy de proyecto

Tech StackStack Tecnologico

Terraform 1.5+ AWS CloudFormation GitHub Actions GCS (TF state) tflint + checkov
Modules, Coverage & Acceptance Criteria Modulos, Cobertura & Criterios de Aceptación
IaC ModulesModulos IaC
# ═══════════════════════════════════════════════════════════
# GOVERNING RULE:
#   DATA projects        → GCP  (Terraform)
#   BACKEND/API/services → AWS  (CloudFormation / CDK)
#   Exceptions: DynamoDB and other AWS-native services stay
#   on AWS even for data they store, as defined per project.
# ═══════════════════════════════════════════════════════════

# ─── GCP (Terraform) — DATA layer ───────────────────────────
# Projects: #10 Data Sync · #9 Cerebro KB · #11 Enrichment
# Repo: shopilot-infra/terraform/

modules/
├── data-sync/          # EXISTS — Cloud Composer (Airflow), GCS,
│                       #          Cloud Run (FastAPI Data API), BigQuery  → #10
├── cerebro-kb/         # BigQuery kb_embeddings, Vertex AI embeddings     → #9
├── enrichment/         # Cloud Run (Enrichment service)                   → #11
├── open-metadata/      # Open Metadata server (Cloud Run)
├── networking/         # VPC, subnets, firewall rules (shared GCP)
├── iam/                # GCP Service accounts, roles, bindings
└── monitoring/         # Cloud Monitoring dashboards + alerts (GCP)

envs/
├── dev.tfvars
├── staging.tfvars
└── prod.tfvars

# State: gs://shopilot-tf-state/{env}/terraform.tfstate

# ─── AWS (CloudFormation / CDK) — BACKEND/API/services ──────
# Projects: #12 · #8 · #13 · #2 · #3 · #15 · #4 · #5 · #6 · #7
# Repo: shopilot-infra/cloudformation/  (or CDK app)

stacks/
├── dynamodb.yaml       # Conversation, Session, Token tables    → #12, #2, #3
├── marketplace.yaml    # Lambda + API GW + Secrets Manager       → #12
├── orchestrator.yaml   # Lambda (ReAct loop, tool dispatch)      → #2
├── intelligence.yaml   # Lambda (Personality #4, Context #5,
│                       #         Proactive #6, Guardrails #7)
├── billing.yaml        # RDS PostgreSQL, Lambda, Stripe webhooks → #13
├── feedback.yaml       # Lambda + EventBridge (impact tracking)  → #15
├── observability.yaml  # CloudWatch dashboards, X-Ray, alarms    → #8
├── ssm.yaml            # Parameter Store configs (all services)
├── iam.yaml            # Lambda execution roles, IAM policies
└── eventbridge.yaml    # Cron rules (token refresh, sync triggers)

# State: S3 bucket (CF native) / CDK bootstrap
# Environments: dev / staging / prod (stack suffixes / CDK context)
Acceptance CriteriaCriterios de Aceptación
  • All GCP resources provisioned via Terraform — zero manual console changes
  • All AWS resources provisioned via CloudFormation — zero manual console changes
  • CI/CD: terraform plan on PR, terraform apply on merge to main
  • 3 environments (dev/staging/prod) with isolated state per env
  • New project infra added by creating a new Terraform module or CF stack
  • Drift detection: weekly check for manual changes, alert if found
  • Todos los recursos GCP aprovisionados via Terraform — cero cambios manuales en consola
  • Todos los recursos AWS aprovisionados via CloudFormation — cero cambios manuales en consola
  • CI/CD: terraform plan en PR, terraform apply en merge a main
  • 3 ambientes (dev/staging/prod) con estado aislado por ambiente
  • Infra de nuevo proyecto se agrega creando un nuevo modulo Terraform o stack CF
  • Deteccion de drift: verificacion semanal de cambios manuales, alertar si se encuentran

Envs: 3 (dev/staging/prod) · GCP: Terraform · AWS: CloudFormation · CI: GitHub Actions · Drift: weekly

How It WorksComo Funciona

Developer pushes infra change
         |
         v
GitHub Actions CI/CD
+-- terraform plan / cfn validate / cdk diff   (on PR)
+-- terraform apply / cfn deploy / cdk deploy  (on merge to main)
         |
         +── GCP (Terraform)  ← DATA projects
         |   +── data-sync module     EXISTS — Airflow, GCS, BigQuery, Cloud Run
         |   +── cerebro-kb module   BigQuery embeddings + Vertex AI
         |   +── enrichment module   Cloud Run (#11)
         |   +── open-metadata module
         |   +── networking / iam / monitoring
         |
         +── AWS (CloudFormation / CDK)  ← BACKEND / API / microservices
             +── dynamodb stack          #12 (tokens) · #2 (sessions) · #3
             +── marketplace stack       #12 Lambda + API GW + Secrets Manager
             +── orchestrator stack      #2 Lambda
             +── intelligence stack      #4 #5 #6 #7 Lambda
             +── billing stack           #13 RDS + Lambda + Stripe
             +── feedback stack          #15 Lambda + EventBridge
             +── observability stack     #8 CloudWatch + X-Ray
             +── ssm / iam / eventbridge

Environments: dev → staging → prod
State:        GCS (Terraform) / S3 or CDK bootstrap (CloudFormation)

Key DecisionsDecisiones Clave

D1.

Cloud split driven by project type, not tooling preference — Data projects (#10 Data Sync, #9 Cerebro KB, #11 Enrichment) run on GCP because that's where the data infrastructure already lives (Cloud Composer, GCS, BigQuery, Vertex AI). All backend/API/microservice projects (#12, #8, #13, #2, #3, #15, #4–#6, #7) run on AWS because they use Lambda, DynamoDB, API Gateway, Secrets Manager — AWS-native services already specified per project. Exceptions are explicit: DynamoDB stays on AWS even for data it stores, because it is the runtime store for service state (tokens, sessions, credits), not the analytical data layer. This split is a governing rule, not a default.Particion cloud orientada por tipo de proyecto, no por preferencia de herramientas — Los proyectos de datos (#10 Data Sync, #9 Cerebro KB, #11 Enrichment) corren en GCP porque ahi ya vive la infraestructura de datos (Cloud Composer, GCS, BigQuery, Vertex AI). Todos los proyectos de backend/API/microservicios (#12, #8, #13, #2, #3, #15, #4–#6, #7) corren en AWS porque usan Lambda, DynamoDB, API Gateway, Secrets Manager — servicios nativos de AWS ya especificados por proyecto. Las excepciones son explicitas: DynamoDB permanece en AWS incluso para datos que almacena, porque es el store de estado de servicio en runtime (tokens, sesiones, creditos), no la capa de datos analiticos. Esta particion es una regla rectora, no un valor por defecto.

D2.

Terraform for GCP, CloudFormation/CDK for AWS — Terraform is already in use for Data Sync GCP resources. CloudFormation/CDK is native to AWS and the right fit for Lambda/DynamoDB/API Gateway stacks. No need for a single tool when each cloud has a mature native option. CDK generates CloudFormation — both are valid; per-project choice.Terraform para GCP, CloudFormation/CDK para AWS — Terraform ya esta en uso para recursos GCP de Data Sync. CloudFormation/CDK es nativo de AWS y el ajuste correcto para stacks Lambda/DynamoDB/API Gateway. No se necesita una sola herramienta cuando cada nube tiene una opcion nativa madura. CDK genera CloudFormation — ambos son validos; eleccion por proyecto.

D3.

Extend, don't rewrite existing Terraform — The GCP Terraform project for Data Sync already works. New modules are added alongside it. Same patterns, same state backend, same CI/CD pipeline.Extender, no reescribir Terraform existente — El proyecto Terraform de GCP para Data Sync ya funciona. Nuevos modulos se agregan junto a el. Mismos patrones, mismo backend de estado, mismo pipeline CI/CD.

D4.

Transversal project — grows with every module — DevOps is not a one-time deliverable. Each project that needs cloud resources adds a module/stack here. The cloud split rule determines where it goes. The project scope expands organically.Proyecto transversal — crece con cada modulo — DevOps no es un entregable unico. Cada proyecto que necesita recursos cloud agrega un modulo/stack aqui. La regla de particion cloud determina adonde va. El alcance del proyecto se expande organicamente.

MVP Scope

Formalize existing GCP Terraform. Add AWS CloudFormation stacks (DynamoDB, Lambda, API Gateway). CI/CD with GitHub Actions. 3 environments. Formalizar Terraform GCP existente. Agregar stacks AWS CloudFormation (DynamoDB, Lambda, API Gateway). CI/CD con GitHub Actions. 3 ambientes.

Inspired byInspirado en

Existing GCP Terraform for Data Sync. Standard IaC practices. Terraform GCP existente para Data Sync. Practicas estandar de IaC.

Source:Fuente: Existing GCP Terraform projectProyecto Terraform GCP existente | Depends on:Depende de: -- (transversal, no depstransversal, sin deps)
📝 Project ChangelogChangelog del Proyecto
v4 Mar 2, 2026
+New project created — no DevOps/IaC project existed before v4Proyecto nuevo creado — no existia proyecto DevOps/IaC antes de v4
+Terraform + CloudFormation dual IaCTerraform + CloudFormation IaC dual
+3 environments: dev, staging, production3 ambientes: dev, staging, produccion
~Pending deep spec rewrite — basic spec onlyPendiente de rewrite deep spec — spec basica solamente
#19

Go to Market & Analytics

core-platform-gtm-analytics — Pablo · External Team

NEW

Go-to-market strategy and user activity tracking. Defines the launch playbook (positioning, channels, early adopter acquisition, onboarding funnels) and the analytics infrastructure to measure product usage, retention, conversion, and growth metrics from day one. Owned by Pablo and an external GTM + analytics team — no engineering tasks for the internal dev team in the MVP sprints. Estrategia de salida al mercado y rastreo de actividad del usuario. Define el playbook de lanzamiento (posicionamiento, canales, adquisición de early adopters, funnels de onboarding) y la infraestructura de analytics para medir uso del producto, retención, conversión y métricas de crecimiento desde el día uno. A cargo de Pablo y un equipo externo de GTM + analytics — sin tareas de ingeniería para el equipo interno de desarrollo en los sprints del MVP.

Owned by Pablo + external GTM & analytics team — no sprint tasks for Sergio, Andrés, or Mateo. Internal engineers may integrate analytics SDKs when specs are ready. A cargo de Pablo + equipo externo de GTM & analytics — sin tareas de sprint para Sergio, Andrés ni Mateo. Los ingenieros internos podrán integrar SDKs de analytics cuando los specs estén listos.

Detailed spec pending — project structure created, content will be added when the GTM strategy is defined. Spec detallado pendiente — estructura del proyecto creada, el contenido se agregará cuando la estrategia de GTM esté definida.

Source: Fuente: New (external team) Nuevo (equipo externo) | Depends on: Depende de: #1 Native Shell, #13 Billing (product + monetization must exist) #1 Native Shell, #13 Billing (producto + monetización deben existir)
📝 Project Changelog Changelog del Proyecto
v1 Mar 9, 2026
+Initial — “Go to Market & Analytics” (#19), NEW status, Pablo + external team ownership. Placeholder spec.Inicial — “Go to Market & Analytics” (#19), estado NEW, propiedad de Pablo + equipo externo. Spec placeholder.
📊

Layer 6 — QUALITYCapa 6 — CALIDAD

What measures if the Coach works wellLo que mide si el Coach funciona bien

+
#15

Feedback Loop

Quality — Sergio

REWRITE

[Phase 2 MVP] core-quality-feedback measures the real business impact of actions the Coach executes in the marketplace. When the Coach changes a product title, this project waits 7 days, compares before/after metrics, and calculates a weighted impact score. Also manages when to ask the seller for explicit feedback (anti-fatigue gate) and collects implicit signal (accepted/rejected/edited proposals). The Coach emits raw signal via conversation-api — it does not process, measure, or decide when to ask. [Fase 2 MVP] core-quality-feedback mide el impacto real de negocio de las acciones que el Coach ejecuta en el marketplace. Cuando el Coach cambia el titulo de un producto, este proyecto espera 7 dias, compara metricas antes/despues, y calcula un impact score ponderado. Tambien gestiona cuando pedirle feedback al vendedor (gate anti-fatiga) y recopila senal implicita (propuestas aceptadas/rechazadas/editadas). El Coach emite la senal cruda via conversation-api — no procesa, no mide, no decide cuando preguntar.

Beautonomous governance: every action measured by Feedback Loop first passed through Core's governance gates — only CONFIRMED WRITE actions (via ConfirmationFlow) generate FeedbackEntry records. Impact measurement is an audit of Core-governed changes, linking accountability to outcomes.Governance de Beautonomous: cada acción medida por Feedback Loop primero pasó por los gates de governance de Core — solo las acciones WRITE CONFIRMED (vía ConfirmationFlow) generan registros FeedbackEntry. La medición de impacto es una auditoría de cambios gobernados por Core, vinculando responsabilidad con resultados.

What this project does NOT doLo que este proyecto NO hace

Capture raw signalconversation-apiCapturar senal crudaconversation-api
Execute marketplace actions → Marketplace Provider (#12)Ejecutar acciones en marketplace → Marketplace Provider (#12)
Update KB directly → KB Pipeline (#9) — FeedbackLearner (Phase 3) only triggersActualizar KB directamente → Pipeline KB (#9) — FeedbackLearner (Fase 3) solo dispara
Decide what to suggest → Coach (LLM)Decidir que sugerir → Coach (LLM)
Render dashboard UI → Shell (#1)Renderizar UI del dashboard → Shell (#1)
Monitor system → Observability (#8)Monitorear el sistema → Observability (#8)
FeedbackMeasurer
Cron every 6h — closes pending entries after 7 daysCron cada 6h — cierra entries pendientes tras 7 dias
FeedbackGate
Anti-fatigue — decides if/when to show feedback promptAnti-fatiga — decide si/cuando mostrar prompt de feedback
FeedbackAPI
6 REST endpoints — summary, history, should-prompt, explicit, implicit6 endpoints REST — summary, history, should-prompt, explicit, implicit
FeedbackSummaryService
Computes impact summaries and topWins per userComputa summaries de impacto y topWins por usuario

Responsibility Split with conversation-apiDivision de Responsabilidades con conversation-api

Lives in conversation-apiVive en conversation-api

FeedbackCapture — hook before_tool / after_tool of HookLifecycle. On WRITE tool: snapshots ProductMetrics before, writes raw FeedbackEntry to DynamoDB with status: pending. Has write-only IAM on the table.FeedbackCapture — hook before_tool / after_tool del HookLifecycle. En tool WRITE: snapshot de ProductMetrics antes, escribe FeedbackEntry raw en DynamoDB con status: pending. Tiene IAM write-only sobre la tabla.

Lives in this projectVive en este proyecto

DynamoDB table (owned here), FeedbackMeasurer (cron), FeedbackGate (anti-fatigue), REST endpoints, explicit/implicit collection. The Shell queries GET /should-prompt — no anti-fatigue logic in the Shell.Tabla DynamoDB (owned aqui), FeedbackMeasurer (cron), FeedbackGate (anti-fatiga), endpoints REST, recopilacion explicita/implicita. La Shell consulta GET /should-prompt — sin logica anti-fatiga en la Shell.

Tech StackStack Tecnologico

TypeScript DynamoDB Lambda EventBridge CDK
Data Models, API Signatures & Acceptance Criteria Modelos de Datos, APIs & Criterios de Aceptación
Data ModelsModelos de Datos
interface FeedbackEntry {
  id: string;                          // ULID
  userId: string;
  executionId: string;                 // AgentExecution id del turno
  toolName: string;                    // e.g. "update_product_content"
  productId: string;
  marketplace: 'meli' | 'amazon';
  fieldChanged?: string;               // e.g. "title", "description"
  valueBefore?: string;
  valueAfter?: string;
  executedAt: Date;
  metricsBefore?: ProductMetrics;      // snapshot pre-ejecucion
  metricsAfter?: ProductMetrics;       // rellenado por FeedbackMeasurer
  impactScore?: number;                // -100 a 100
  impactClass?: 'positive' | 'neutral' | 'negative';
  measuredAt?: Date;
  retryCount: number;                  // intentos de medicion (max 3)
  status: 'pending' | 'measured' | 'unmeasurable';
}

interface ProductMetrics {
  visits7d: number;
  sales7d: number;
  conversionRate: number;              // 0-1
  searchPosition: number;             // posicion promedio (lower = better)
  capturedAt: Date;
}

interface ExplicitFeedbackEntry {
  id: string;                          // ULID
  userId: string;
  executionId?: string;               // puede ser feedback general de sesion
  trigger: 'post_write' | 'post_reject' | 'post_session';
  sentiment?: 'positive' | 'neutral' | 'negative';
  rating?: number;                    // 1-5
  reason?: string;                    // texto libre opcional
  createdAt: Date;
  sessionId: string;
}

interface ImplicitFeedbackEntry {
  id: string;                          // ULID
  userId: string;
  sessionId: string;
  skillProposed: string;              // e.g. "update_product_content"
  action: 'accepted' | 'rejected' | 'edited' | 'ignored';
  context: {
    category?: string;
    marketplace?: 'meli' | 'amazon';
    productId?: string;
  };
  timeToActionMs?: number;
  createdAt: Date;
}

interface FeedbackThrottle {
  userId: string;
  sessionId: string;
  promptsByType: Record<string, number>;  // trigger → count en esta sesion
  lastPromptAt?: Date;
  consecutiveIgnores: number;
  suppressed: boolean;
  ttl: number;                            // TTL de 48h en DynamoDB
}
API Signatures & ContractsFirmas de API & Contratos
// ── IFeedbackGate ──
type FeedbackTrigger = 'post_write' | 'post_reject' | 'post_session';
interface GateResult {
  shouldPrompt: boolean;
  type?: FeedbackTrigger;
  reason?: string;                        // logging interno
}
interface IFeedbackGate {
  shouldPrompt(userId: string, sessionId: string, trigger: FeedbackTrigger): Promise<GateResult>;
  recordIgnore(userId: string, sessionId: string): Promise<void>;
}
// Rules: max 1/type/session · cooldown 15min · suppress after 2 ignores
//        backoff after 3 sessions with all-ignored

// ── Impact Score ──
function calculateImpactScore(before: ProductMetrics, after: ProductMetrics): ImpactResult;
// Weights: visits7d × 0.2, sales7d × 0.4, conversionRate × 0.3, searchPosition × 0.1 (inverted)
// Classification: > +20 → positive · -20 to +20 → neutral · < -20 → negative
// Range: clamped -100 to 100

// ── REST Endpoints ──
GET  /feedback/:userId/summary        // counts por clase + topWins
GET  /feedback/:userId/history        // lista paginada. ?productId=
GET  /feedback/:userId/should-prompt  // { shouldPrompt, type }. ?trigger=&sessionId=
POST /feedback/explicit               // crea ExplicitFeedbackEntry
POST /feedback/implicit               // crea ImplicitFeedbackEntry
GET  /feedback/:userId/implicit/summary // { acceptanceRateBySkill, totalProposals, totalAccepted }

// ── FeedbackMeasurer (cron) ──
// EventBridge rate(6 hours) → FeedbackMeasurerHandler
// 1. Query entries with status: pending
// 2. Skip if executedAt < 7 days ago
// 3. Fetch current metrics from Data Sync (#10)
// 4. If metrics unavailable: retryCount++ (max 3 → status: unmeasurable)
// 5. If metrics available: calculateImpactScore → status: measured
DynamoDB Table DesignDiseno de Tabla DynamoDB
// Table: core-feedback
// ┌───────────────────┬────────────────────┬──────────────────────┐
// │ Entity            │ pk                 │ sk                   │
// ├───────────────────┼────────────────────┼──────────────────────┤
// │ FeedbackEntry     │ User#{userId}      │ Feedback#{ULID}      │
// │ FeedbackThrottle  │ User#{userId}      │ Throttle#{sessionId} │  TTL 48h
// │ ExplicitFeedback  │ User#{userId}      │ Explicit#{ULID}      │
// │ ImplicitFeedback  │ User#{userId}      │ Implicit#{ULID}      │
// └───────────────────┴────────────────────┴──────────────────────┘
//
// GSI1: pk = status, sk = executedAt
// Usage: FeedbackMeasurer queries status='pending' ordered by executedAt
//
// conversation-api has IAM write-only on this table (PutItem only)
CDK ComponentsComponentes CDK
// FeedbackAPIHandler      → Lambda   → serves REST endpoints
// FeedbackMeasurerHandler → Lambda   → entry point for measurement cron
// EventBridge Rule        → Rule     → rate(6 hours) → FeedbackMeasurerHandler
// core-feedback           → DynamoDB → table with GSI1 (status/executedAt)
// IAM Grant               → Policy   → conversation-api Lambda → PutItem on core-feedback
File StructureEstructura de Archivos
core-quality-feedback/
├── src/
│   ├── domain/
│   │   ├── interfaces/
│   │   │   ├── IFeedbackRepository.ts
│   │   │   └── IFeedbackGate.ts
│   │   └── models/
│   │       ├── FeedbackEntry.ts
│   │       ├── ExplicitFeedbackEntry.ts
│   │       ├── ImplicitFeedbackEntry.ts
│   │       └── FeedbackThrottle.ts
│   ├── application/
│   │   ├── FeedbackMeasurerService.ts
│   │   ├── FeedbackGate.ts
│   │   └── FeedbackSummaryService.ts
│   └── infrastructure/
│       ├── repositories/
│       │   └── DynamoFeedbackRepository.ts
│       └── lambda/
│           ├── FeedbackAPIHandler.ts
│           └── FeedbackMeasurerHandler.ts
├── lib/
│   └── feedback-stack.ts
└── test/
Acceptance CriteriaCriterios de Aceptación
  • [Ph2 MVP] FeedbackCapture (in conversation-api) writes raw FeedbackEntry with metricsBefore on every WRITE tool execution
  • [Ph2 MVP] FeedbackMeasurer cron closes pending entries after 7 days with <1% unmeasurable rate
  • [Ph2 MVP] GET /feedback/:userId/summary returns counts by impactClass + topWins
  • [Ph2 MVP] GET /feedback/:userId/history returns paginated FeedbackEntry list with ?productId filter
  • [Ph2 Full] FeedbackGate enforces: max 1 prompt/type/session, 15min cooldown, suppression after 2 consecutive ignores
  • [Ph2 Full] GET /feedback/:userId/should-prompt returns correct gate decision for Shell
  • [Ph2 Full] POST /feedback/explicit creates ExplicitFeedbackEntry with trigger, sentiment, rating
  • [Ph2 Full] POST /feedback/implicit creates ImplicitFeedbackEntry with action and context
  • [Ph2 Full] GET /feedback/:userId/implicit/summary returns acceptanceRateBySkill
  • [Ph2 Full] FeedbackThrottle TTL 48h — auto-expires in DynamoDB
  • [Ph2 Full] conversation-api has IAM write-only on core-feedback table — cannot read or manage
  • [Ph2 MVP] FeedbackCapture (en conversation-api) escribe FeedbackEntry raw con metricsBefore en cada ejecucion de tool WRITE
  • [Ph2 MVP] FeedbackMeasurer cron cierra entries pendientes tras 7 dias con <1% de tasa unmeasurable
  • [Ph2 MVP] GET /feedback/:userId/summary retorna counts por impactClass + topWins
  • [Ph2 MVP] GET /feedback/:userId/history retorna lista paginada de FeedbackEntry con filtro ?productId
  • [Ph2 Full] FeedbackGate respeta: max 1 prompt/tipo/sesion, cooldown 15min, supresion tras 2 ignores consecutivos
  • [Ph2 Full] GET /feedback/:userId/should-prompt retorna decision correcta del gate para la Shell
  • [Ph2 Full] POST /feedback/explicit crea ExplicitFeedbackEntry con trigger, sentiment, rating
  • [Ph2 Full] POST /feedback/implicit crea ImplicitFeedbackEntry con action y context
  • [Ph2 Full] GET /feedback/:userId/implicit/summary retorna acceptanceRateBySkill
  • [Ph2 Full] FeedbackThrottle TTL 48h — auto-expira en DynamoDB
  • [Ph2 Full] conversation-api tiene IAM write-only sobre tabla core-feedback — no puede leer ni gestionar

Measurement delay: 7 days · Cron: every 6h · Weights: visits 0.2, sales 0.4, conv 0.3, position 0.1 (inverted) · Thresholds: ±20 · Retry: max 3 → unmeasurable · Gate: 1/type/session, 15min cooldown, suppress after 2 ignores · Table: single-table core-feedback with GSI1

How It WorksComo Funciona

Coach executes a WRITE tool (e.g. update_product_content)
        ↓
FeedbackCapture (in conversation-api)
  hook before_tool: snapshot ProductMetrics
  hook after_tool:  write FeedbackEntry raw → DynamoDB (status: pending)
  conversation-api has IAM write-only on the table
        ↓
        ... 7 days later ...
        ↓
FeedbackMeasurer (cron every 6h, in THIS project)
  1. Query entries with status: pending, age >= 7d
  2. Fetch current metrics from Data Sync (#10)
  3. If metrics unavailable → retryCount++ (max 3 → unmeasurable)
  4. Calculate impact score:
     visits7d    × 0.2  = +10.8
     sales7d     × 0.4  = +40.0
     convRate    × 0.3  = +9.0
     searchPos   × 0.1  = +4.7  (inverted: lower position = better)
     ─────────────────────────
     impactScore = +64.5 → POSITIVE (>+20)
  5. Update entry: metricsAfter, impactScore, impactClass, status: measured
        ↓
Shell queries GET /feedback/:userId/summary
  → shows seller: "+54% visits after title change on MLA123456"
        ↓
Shell queries GET /feedback/:userId/should-prompt
  → FeedbackGate checks throttle rules → { shouldPrompt: true, type: 'post_write' }
  → Shell renders inline feedback prompt
  → Seller responds → POST /feedback/explicit
            

The Coach's job is to respond in real time. Measuring impact happens days later. The logic for measurement windows, marketplace delay retries (24-48h), weighted scoring, and anti-fatigue has no place in the request path. Once the Coach is in production, adding or changing feedback logic must not touch the conversation loop — they are two distinct lifecycles: the loop converses in milliseconds; feedback measures in days.El trabajo del Coach es responder en tiempo real. Medir impacto ocurre dias despues. La logica de ventanas de medicion, reintentos por delay del marketplace (24-48h), scoring ponderado, y anti-fatiga no tiene lugar en el path del request. Una vez que el Coach esta en produccion, agregar o cambiar la logica de feedback no debe tocar el loop de conversacion — son dos ciclos de vida distintos: el loop conversa en milisegundos; el feedback mide en dias.

Implementation PlanPlan de Implementacion

Phase 2 MVP — Scheduled Post-Core BuildFase 2 MVP — Programado Post-Construccion Core

This project is scheduled for Phase 2 of the MVP build. FeedbackCapture lives in conversation-api as a hook — not in this project. Full implementation begins after the 10-week core build when there is real user data to measure against.Este proyecto esta programado para la Fase 2 de la construccion del MVP. FeedbackCapture vive en conversation-api como hook — no en este proyecto. La implementacion completa comienza despues de la construccion core de 10 semanas cuando haya datos reales de usuarios para medir.

Phase 2 MVP: Capture + Measurement (Post-MVP Week 1-2)Fase 2 MVP: Captura + Medicion (Post-MVP Semana 1-2)

FeedbackCapture hook in conversation-api (before_tool / after_tool). FeedbackMeasurerService + Lambda + EventBridge cron. GET /feedback/:userId/summary and GET /feedback/:userId/history endpoints.Hook FeedbackCapture en conversation-api (before_tool / after_tool). FeedbackMeasurerService + Lambda + EventBridge cron. Endpoints GET /feedback/:userId/summary y GET /feedback/:userId/history.

Phase 2 Full: Gate + Feedback Collection (Post-MVP Week 3-4)Fase 2 Full: Gate + Recopilacion de Feedback (Post-MVP Semana 3-4)

FeedbackGate + FeedbackThrottle in DynamoDB. GET /should-prompt endpoint. POST /feedback/explicit and /implicit endpoints. GET /implicit/summary endpoint. Shell integration — Shell queries gate, renders prompt, posts response.FeedbackGate + FeedbackThrottle en DynamoDB. Endpoint GET /should-prompt. Endpoints POST /feedback/explicit e /implicit. Endpoint GET /implicit/summary. Integracion con Shell — Shell consulta gate, renderiza prompt, envia respuesta.

Phase 3: FeedbackLearner (Post-MVP Week 4+)Fase 3: FeedbackLearner (Post-MVP Semana 4+)

Reads FeedbackEntry with impactClass: positive and triggers KB pipeline in core-knowledge-semantic-base (#9). Does NOT write to KB directly — fires external pipeline. Requires sufficient measured data volume to avoid contaminating KB with noisy signal.Lee FeedbackEntry con impactClass: positive y dispara pipeline de KB en core-knowledge-semantic-base (#9). NO escribe en KB directamente — dispara el pipeline externo. Requiere volumen suficiente de datos medidos para evitar contaminar KB con senal ruidosa.

Risk AnalysisAnalisis de Riesgos

Confounding FactorsFactores Confundidores

Impact: HImpacto: A

Mitigation: A title change may coincide with a competitor going out of stock or a seasonal surge. The system reports correlation, not causation. Users see "after changing the title, visits increased +54%" — not "your change caused +54%".Mitigacion: Un cambio de titulo puede coincidir con un competidor sin stock o un auge estacional. El sistema reporta correlacion, no causalidad. El usuario ve "despues de cambiar el titulo, las visitas subieron +54%" — no "tu cambio causo +54%".

7-Day Measurement WindowVentana de Medicion de 7 Dias

Impact: MImpacto: M

Mitigation: 7 days may be too short for SEO changes (2-4 weeks) and too long for price changes (24h). Keep 7 days as default. Future: configurable per skill type.Mitigacion: 7 dias puede ser muy corto para cambios SEO (2-4 semanas) y muy largo para cambios de precio (24h). Mantener 7 dias como default. Futuro: configurable por tipo de skill.

False AttributionAtribucion Falsa

Impact: HImpacto: A

Mitigation: Multiple concurrent changes on the same product make attribution impossible. Track concurrent FeedbackEntry per productId and flag in impact report. Never claim causation.Mitigacion: Multiples cambios concurrentes en el mismo producto hacen la atribucion imposible. Rastrear FeedbackEntry concurrentes por productId y marcar en el reporte de impacto. Nunca reclamar causalidad.

Delayed Metrics AvailabilityDisponibilidad Retrasada de Metricas

Impact: MImpacto: M

Mitigation: Marketplace APIs report metrics with 24-48h delay. FeedbackMeasurer retries up to 3 times (retryCount). After 3 failed attempts: status → unmeasurable. Does not block the pipeline.Mitigacion: APIs de marketplace reportan metricas con retraso de 24-48h. FeedbackMeasurer reintenta hasta 3 veces (retryCount). Tras 3 intentos fallidos: status → unmeasurable. No bloquea el pipeline.

Key DecisionsDecisiones Clave

D1.

Separated from conversation-api — The Coach emits a raw signal and doesn't know what happens with it. Measurement, scoring, anti-fatigue, and learning logic must not be in the request path. Two distinct lifecycles: the loop converses in milliseconds; feedback measures in days.Separado de conversation-api — El Coach emite una senal cruda y no sabe que pasa con ella. La logica de medicion, scoring, anti-fatiga y aprendizaje no debe estar en el path del request. Dos ciclos de vida distintos: el loop conversa en milisegundos; el feedback mide en dias.

D2.

FeedbackGate lives here, Shell queries it — The Shell has no anti-fatigue logic. The gate is this project's responsibility — it owns the complete state of prompts per session and ignore history.FeedbackGate vive aqui, la Shell lo consulta — La Shell no tiene logica de anti-fatiga. El gate es responsabilidad de este proyecto — tiene el estado completo de los prompts por sesion y el historial de ignores.

D3.

Correlation, not causation — "After changing the title, visits increased +54%" — not "your change caused the increase". The system cannot control other variables affecting the product at the same time.Correlacion, no causalidad — "Despues de cambiar el titulo, las visitas subieron +54%" — no "tu cambio causo el aumento". El sistema no puede controlar otras variables que afectan el producto al mismo tiempo.

D4.

FeedbackLearner deferred to Phase 3 — Automating KB updates requires real data at scale. Until there are enough measured FeedbackEntry with impactClass: positive, the learner is premature and risks contaminating the KB with noisy signals.FeedbackLearner diferido a Fase 3 — Automatizar actualizaciones en la KB requiere datos reales a escala. Hasta tener suficientes FeedbackEntry medidos con impactClass: positive, el learner es prematuro y arriesga contaminar la KB con senales ruidosas.

MVP Scope

Phase 2 MVP. FeedbackCapture hook in conversation-api writes raw entries. This project measures, scores, and exposes results via REST. Fase 2 MVP. Hook FeedbackCapture en conversation-api escribe entries crudos. Este proyecto mide, puntua, y expone resultados via REST.

Inspired byInspirado en

A/B testing frameworks, Shopilot Data Sync pipeline Frameworks de A/B testing, pipeline Data Sync de Shopilot

Source:Fuente: New | Depends on:Depende de: #10 · Phase 3: #9
📝 Project ChangelogChangelog del Proyecto
v4 Mar 2, 2026
~REACTIVATED from DEFERRED — Phase 2 MVP with deep specREACTIVADO de DIFERIDO — Phase 2 MVP con deep spec
~Renamed “Feedback & Learning Loop” → “Feedback Loop” (Learning is Phase 3)Renombrado “Feedback & Learning Loop” → “Feedback Loop” (Learning es Phase 3)
~Components: 6 mixed → 4 real (FeedbackMeasurer, FeedbackGate, FeedbackAPI, FeedbackSummaryService)Componentes: 6 mezclados → 4 reales (FeedbackMeasurer, FeedbackGate, FeedbackAPI, FeedbackSummaryService)
~Stack: Python asyncio + BigQuery → TypeScript, DynamoDB, Lambda, EventBridge, CDKStack: Python asyncio + BigQuery → TypeScript, DynamoDB, Lambda, EventBridge, CDK
+“What this project does NOT do” section (6 items with where each lives)Seccion “Lo que este proyecto NO hace” (6 items con donde vive cada uno)
+Responsibility split with conversation-api: FeedbackCapture as hook (IAM write-only)Separacion de responsabilidad con conversation-api: FeedbackCapture como hook (IAM write-only)
+DynamoDB single-table design (core-feedback, GSI1, IAM write-only)Diseno single-table DynamoDB (core-feedback, GSI1, IAM write-only)
~Owner: Mateo → PabloOwner: Mateo → Pablo
~Dependencies: #8, #10, #9, #12 → #10 · Phase 3: #9Dependencias: #8, #10, #9, #12 → #10 · Phase 3: #9
v3 Feb 27-28, 2026
~Exists as DEFERRED project (opacity:0.6) — minimal spec, Phase 2 placeholderExiste como proyecto DIFERIDO (opacity:0.6) — spec minima, placeholder Phase 2
v2 Feb 27, 2026
+Created as new project (split from v1 #9 Conversation Memory)Creado como proyecto nuevo (split de v1 #9 Conversation Memory)
Immediately DEFERRED to Phase 2 — saves ~1 engineer-weekInmediatamente DIFERIDO a Phase 2 — ahorra ~1 semana-ingeniero
#16

Eval Suite

Quality — Pablo

NEW

core-quality-stack-evaluation is the quality evaluation platform for every project in the stack. It runs automated suites on every PR (Coach, Shell), on schedule (Figma), and on demand. It evaluates Coach response quality via an LLM Judge, validates API contracts between projects, checks KB chunk retrievability, validates Electron builds for macOS and Windows (compilation, signing, notarization, startup, bundle size), and audits Design System Figma files against 15 quality checks. It also runs the api_monitor pipeline: daily checks against marketplace API changelogs + canary tests against live endpoints. It never runs in production — its role is to block merges that introduce regressions, validate that desktop builds are distributable, and ensure Figma is MCP-compatible before implementation. core-quality-stack-evaluation es la plataforma de evaluación de calidad para todos los proyectos del stack. Ejecuta suites de evaluación automáticas en cada PR (Coach, Shell), en schedule (Figma), y bajo demanda. Evalúa la calidad de respuestas del Coach via un LLM Judge, valida contratos de API entre proyectos, chequea la recuperabilidad de chunks de KB, valida builds de Electron para macOS y Windows (compilación, firma, notarización, arranque, tamaño del bundle), y audita los archivos Figma del Design System contra 15 checks de calidad. También ejecuta el pipeline api_monitor: chequeos diarios contra changelogs de APIs de marketplaces + canary tests contra endpoints en vivo. Nunca corre en producción — su rol es bloquear merges que introducen regresiones, validar que los builds de escritorio son distribuibles, y asegurar que el Figma es MCP-compatible antes de implementar.

Beautonomous governance: Eval Suite is the quality gate that validates Core-governed changes before they reach production — it blocks merges that introduce regressions in ConfirmationFlow enforcement, permission matrix adherence, or governance rule compliance across all projects in the stack. Desktop builds are validated as distributable artifacts; Figma is validated as MCP-compatible input for the design-to-code pipeline.Governance de Beautonomous: Eval Suite es el gate de calidad que valida los cambios gobernados por Core antes de que lleguen a producción — bloquea merges que introducen regresiones en la aplicación del ConfirmationFlow, la adherencia a la matriz de permisos, o el cumplimiento de las reglas de governance en todos los proyectos del stack. Los builds de escritorio se validan como artefactos distribuibles; el Figma se valida como input MCP-compatible para el pipeline design-to-code.

What this project does NOT doLo que este proyecto NO hace

Run in production → Never — CI/CD and on-demand onlyCorrer en producción → Nunca — solo CI/CD y bajo demanda
Monitor production → Observability (#8)Monitorear producción → Observability (#8)
Collect seller feedback → Feedback Loop (#15)Recopilar feedback del vendedor → Feedback Loop (#15)
Define what’s “good” → Team defines golden datasetsDefinir qué es “bueno” → El equipo define los golden datasets
Measure marketplace impact → Feedback Loop (#15)Medir impacto en marketplace → Feedback Loop (#15)
Detect hallucinations at runtime → Orchestrator (#2)Detectar alucinaciones en runtime → Orchestrator (#2)
Implement React components from Figma → Native Shell (#1) — this project only validates qualityImplementar componentes React desde Figma → Native Shell (#1) — este proyecto solo valida calidad
Design Figma or create variables → Design System (#18), UX/UI team — this project only auditsDiseñar el Figma ni crear variables → Design System (#18), equipo UX/UI — este proyecto solo audita
Distribute builds → DevOps (#14) — this project only validates the artifact is correctDistribuir los builds → DevOps (#14) — este proyecto solo valida que el artefacto es correcto
Run unit/integration tests → Each project runs its own tests — this evaluates cross-cutting qualityEjecutar tests unitarios/de integración → Cada proyecto corre sus propios tests — este evalúa calidad transversal

7 Evaluation Pipelines7 Pipelines de Evaluación

llm_judge Phase 1
Coach response quality vs golden dataset. Judge LLM (Haiku/Sonnet) scores relevance, accuracy, tone, actionabilityCalidad de respuesta del Coach vs golden dataset. Judge LLM (Haiku/Sonnet) puntúa relevancia, precisión, tono, accionabilidad
contract Phase 2
API contracts between projects. Consumer-driven: Tool Registry defines what it expects from Data Sync. Provider can’t break consumer without knowingContratos de API entre proyectos. Consumer-driven: Tool Registry define qué espera de Data Sync. El proveedor no puede romper al consumidor sin saberlo
kb_quality Phase 3
Are KB chunks relevant and retrievable for expected queries? Detects knowledge gaps before they’re visible in production¿Los chunks de KB son relevantes y recuperables para queries esperadas? Detecta huecos de conocimiento antes de que sean visibles en producción
e2e Phase 3+
Full Shell flows: proposal → confirmation → action → response. Cross-project regressions in a single reportFlujos completos de Shell: propuesta → confirmación → acción → respuesta. Regresiones cross-proyecto en un solo reporte
desktop_build Phase 4
Electron builds for macOS (arm64+x64) and Windows (x64). 11 checks: compilation, code signing, notarization, app startup, bundle size, native modules, auto-updater, deep links, window rendering, IPC channels. Runs on native runners per platformBuilds de Electron para macOS (arm64+x64) y Windows (x64). 11 checks: compilación, firma de código, notarización, arranque, tamaño del bundle, módulos nativos, auto-updater, deep links, renderizado de ventana, canales IPC. Corre en runners nativos por plataforma
figma_quality Phase 5
15 automated checks against Design System (#18) requirements via Figma REST API: variable architecture, Code Syntax, Auto Layout, naming, states, color hardcoding, spacing, semantic aliasing, Light/Dark modes, WCAG contrast, MCP compatibility. Scheduled weekly + on-demand. Blocks implementation, not merges15 checks automáticos contra requisitos del Design System (#18) via Figma REST API: arquitectura de variables, Code Syntax, Auto Layout, naming, states, color hardcodeado, spacing, aliasing semántico, modos Light/Dark, contraste WCAG, compatibilidad MCP. Semanal + bajo demanda. Bloquea implementación, no merges
api_monitor Phase 1+
Daily checks against marketplace API changelogs (MeLi, Amazon SP-API, Shopify) + canary tests against live endpoints. When a breaking change or new capability is detected, creates a Linear issue tagged api-change with the affected adapter and recommended action. Runs on a cron schedule — not triggered by code changesChequeos diarios contra changelogs de APIs de marketplaces (MeLi, Amazon SP-API, Shopify) + canary tests contra endpoints en vivo. Cuando detecta un cambio incompatible o nueva capacidad, crea un issue en Linear con tag api-change con el adaptador afectado y la acción recomendada. Corre en schedule cron — no se dispara por cambios de código
EvalRunner
Orchestrates pipelines — runs configured suite against targetOrquesta pipelines — corre suite configurada contra target
LLMJudge
External evaluator (Haiku/Sonnet) — scores relevance, accuracy, tone, actionabilityEvaluador externo (Haiku/Sonnet) — puntua relevancia, precision, tono, accionabilidad
ContractTester
Consumer-driven contract validation between projectsValidacion de contratos consumer-driven entre proyectos
ReportGenerator
EvalReport with per-case scores + regression delta vs baselineEvalReport con scores por caso + delta de regresion vs baseline
CoachEvalRunner
Specialized runner for llm_judge pipelineRunner especializado para pipeline llm_judge
KBQualityRunner
Validates chunk retrievability for expected queriesValida recuperabilidad de chunks para queries esperadas
DesktopBuildRunner
11 checks per platform — compilation, signing, startup, bundle, IPC11 checks por plataforma — compilación, firma, arranque, bundle, IPC
FigmaQualityRunner
15 checks against DS requirements — variables, Auto Layout, WCAG, MCP15 checks contra requisitos del DS — variables, Auto Layout, WCAG, MCP
FigmaRESTClient
Reads Figma files via REST API — nodes, variables, components, stylesLee archivos Figma via API REST — nodos, variables, componentes, estilos

Sandbox Isolation — How it evaluates without affecting productionAislamiento Sandbox — Como evalua sin afectar produccion

Invokes conversation-api Lambda directly (not via HTTP) in a sandboxed staging environment. Uses a fixed snapshot of KB and brand health (reproducible between runs).Invoca Lambda de conversation-api directamente (no via HTTP) en un entorno de staging sandboxed. Usa snapshot fijo de KB y brand health (reproducible entre runs).
Does NOT use the production DynamoDB table — uses a separate test table with fixture data. Eval results never modify any system state.NO usa la tabla DynamoDB de produccion — usa tabla de test separada con datos de fixture. Los resultados del eval nunca modifican ningun estado del sistema.

Tech Stack (TypeScript — CI/CD Tooling)Stack Tecnologico (TypeScript — Tooling CI/CD)

TypeScript GitHub Actions Claude Haiku (Judge) Claude Sonnet (critical) JSON Schema (contracts) YAML (golden datasets) Figma REST API macOS/Windows runners
Data Models, Interfaces & Acceptance Criteria Modelos de Datos, Interfaces & Criterios de Aceptación
Core InterfacesInterfaces Core
interface IEvalPipeline {
  run(config: EvalConfig): Promise<EvalReport>;
}

interface EvalConfig {
  projectId: string;
  pipelineType: 'llm_judge' | 'contract' | 'kb_quality' | 'e2e' | 'desktop_build' | 'figma_quality';
  datasetId: string;
  blockOnFailure: boolean;
  thresholds: EvalThresholds;
}

interface EvalThresholds {
  minPassRate: number;           // 0-1, e.g. 0.85
  maxRegressionDelta: number;    // e.g. -0.05 = no more than 5% regression vs baseline
}

interface EvalReport {
  pipelineId: string;
  projectId: string;
  passRate: number;
  cases: EvalCase[];
  regressionDelta?: number;
  blocksDeployment: boolean;
  generatedAt: Date;
}

interface EvalCase {
  id: string;
  input: unknown;
  expectedOutput: unknown;
  actualOutput: unknown;
  score: number;                 // 0-1
  passed: boolean;
  judgeRationale?: string;       // Judge LLM explanation
}
Golden Dataset & LLM JudgeGolden Dataset & LLM Judge
interface GoldenDataset {
  id: string;
  projectId: string;
  version: string;
  cases: GoldenCase[];
}

interface GoldenCase {
  id: string;
  description: string;
  input: unknown;
  expectedOutput: unknown;
  evaluationCriteria: string[];  // passed to Judge LLM as scoring rubric
  tags: string[];                // e.g. ['write_tool', 'meli', 'high_priority']
}

interface LLMJudgeScore {
  relevance: number;      // 0-1: does it answer the question?
  accuracy: number;       // 0-1: is the information correct?
  tone: number;           // 0-1: matches Personality Engine?
  actionability: number;  // 0-1: can the seller act on this?
  overall: number;        // weighted: relevance 0.3 · accuracy 0.4 · tone 0.15 · actionability 0.15
}
// Judge uses Claude Haiku for most cases. Claude Sonnet for cases tagged 'critical'.
Contract TestingContract Testing
interface ContractTest {
  consumer: string;        // e.g. 'tool-registry'
  provider: string;        // e.g. 'data-sync'
  endpoint: string;        // e.g. 'GET /products/:id/metrics'
  requestSchema: JSONSchema;
  responseSchema: JSONSchema;
  slaMs: number;           // max expected response time
}
// Consumer-driven: the consumer defines what it expects, not the provider.
// If Data Sync changes its response schema and removes a field that
// Tool Registry uses, the contract fails BEFORE deploy.
// Contracts: Tool Registry ↔ Data Sync, Tool Registry ↔ Enrichment
Desktop Build EvalEval de Build de Escritorio
interface DesktopBuildConfig {
  platforms: ('darwin' | 'win32')[];
  arch: ('x64' | 'arm64')[];
  checks: DesktopBuildCheck[];
  maxBundleSizeMB: number;             // e.g. 250
  maxStartupMs: number;                // e.g. 5000
}

type DesktopBuildCheck =
  | 'compilation'       // build completes without errors
  | 'code_signing'      // binary is signed (codesign / Authenticode)
  | 'notarization'      // Apple notarization passes (macOS only)
  | 'app_startup'       // app starts without crash in <5s
  | 'bundle_size'       // artifact < maxBundleSizeMB
  | 'native_modules'    // keytar, better-sqlite3, etc. load correctly
  | 'auto_updater'      // update feed URL resolves
  | 'deep_links'        // shopilot:// protocol registered
  | 'window_rendering'  // WebContentsView loads without critical errors
  | 'ipc_channels'      // registered IPC channels respond to ping

interface DesktopBuildReport extends EvalReport {
  platform: 'darwin' | 'win32';
  arch: 'x64' | 'arm64';
  checks: DesktopCheckResult[];
  bundleSizeMB: number;
  bundleSizeDeltaMB: number;       // vs baseline
  startupTimeMs: number;
}

// Blocks merge if: compilation fails, signing fails, notarization fails,
// app crashes on startup, bundle > 250MB, native modules fail to load,
// window rendering has critical errors, IPC channels don't respond.
// Warning only (release branches): auto_updater, deep_links.
Figma Quality EvalEval de Calidad del Figma
interface FigmaQualityConfig {
  fileKeys: FigmaFileKey[];
  checks: FigmaQualityCheck[];
  minComplianceRate: number;           // 0-1, e.g. 0.95
}

type FigmaQualityCheck =
  | 'variable_architecture'  // 3 collections (Primitives, Semantic, Component)
  | 'code_syntax'            // all variables have Code Syntax (Web) configured
  | 'auto_layout'            // all components use Auto Layout
  | 'naming_convention'      // slash naming, no generic names (Frame 1, Group)
  | 'states_coverage'        // all interactive states present per type
  | 'color_hardcoding'       // no hardcoded hex in components
  | 'spacing_hardcoding'     // no hardcoded spacing values
  | 'semantic_aliasing'      // Semantic tokens alias Primitives
  | 'light_dark_modes'       // Semantic has Light + Dark modes
  | 'component_properties'   // Component Properties used to reduce variants
  | 'descriptions'           // published components have descriptions
  | 'cover_pages'            // each file has cover page
  | 'base_components_hidden' // . or _ prefix components are hidden
  | 'wcag_contrast'          // WCAG AA contrast verified
  | 'mcp_compatibility'      // semantic names in all layers

interface FigmaQualityReport extends EvalReport {
  files: FigmaFileReport[];
  overallComplianceRate: number;
  criticalViolations: FigmaViolation[];
  warnings: FigmaViolation[];
}

interface FigmaViolation {
  check: FigmaQualityCheck;
  severity: 'critical' | 'warning';
  componentName?: string;
  nodeName?: string;
  detail: string;
  suggestion: string;       // e.g. "Bind fill to variable color/interactive/primary"
}

// Critical (blocks implementation): variable_architecture, code_syntax,
//   auto_layout, color_hardcoding, naming_convention, states_coverage,
//   light_dark_modes, wcag_contrast, mcp_compatibility.
// Warning (no block): spacing_hardcoding, semantic_aliasing,
//   component_properties, descriptions, cover_pages, base_components_hidden.
// Reads Figma via REST API, NOT MCP. MCP is for interactive agent use.
// Scheduled weekly + on-demand. Does not block merges — blocks implementation.
Judge LLM Prompt TemplateTemplate de Prompt del Judge LLM
Eres un evaluador de calidad de un agente conversacional para vendedores de marketplace.

Query del usuario: {query}
Contexto recuperado (KB + tool results): {context}
Respuesta del Coach: {response}

Evalua la respuesta contra los siguientes criterios:
{evaluationCriteria}

Para cada criterio, asigna un score de 0 a 1 y explica brevemente por que.
Responde con JSON:
{
  "scores": {
    "relevance": 0.0,
    "accuracy": 0.0,
    "tone": 0.0,
    "actionability": 0.0
  },
  "judgeRationale": "..."
}
CI/CD IntegrationIntegracion CI/CD
# .github/workflows/eval-on-pr.yml
name: Eval Suite
on:
  pull_request:
    branches: [main, develop]
jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run eval suite
        run: npm run eval -- --project=coach --dataset=v2
        env:
          STAGING_ENDPOINT: ${{ secrets.STAGING_ENDPOINT }}
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
      - name: Check threshold
        run: npm run eval:check
        # Fails job if EvalReport.blocksDeployment === true
      - name: Post report to PR
        uses: actions/github-script@v7

# Flow:
# 1. GitHub Actions triggers EvalRunner on PR
# 2. EvalRunner runs configured pipelines against new version
# 3. Compares passRate with stored baseline
# 4. If regressionDelta < maxRegressionDelta → blocksDeployment: true → blocks merge
# 5. Reports stored in S3 or DynamoDB for historical comparison
Desktop Build CI (native runners)CI de Build de Escritorio (runners nativos)
# .github/workflows/desktop-build-eval.yml
name: Desktop Build Eval
on:
  pull_request:
    paths: ['core-product-desktop-client/**']
    branches: [main, develop]
jobs:
  build-macos:
    runs-on: macos-14                    # Apple Silicon runner
    steps:
      - uses: actions/checkout@v4
      - run: npm ci && npm run build:mac  # arm64 + x64
        env:
          CSC_LINK: ${{ secrets.MAC_CERTIFICATE }}
          APPLE_ID: ${{ secrets.APPLE_ID }}
      - run: npm run eval -- --pipeline=desktop_build --platform=darwin
  build-windows:
    runs-on: windows-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci && npm run build:win  # x64
        env:
          WIN_CSC_LINK: ${{ secrets.WIN_CERTIFICATE }}
      - run: npm run eval -- --pipeline=desktop_build --platform=win32
  report:
    needs: [build-macos, build-windows]
    runs-on: ubuntu-latest
    # Aggregates reports from both platforms, posts to PR
Figma Quality CI (scheduled + manual)CI de Calidad Figma (programado + manual)
# .github/workflows/figma-quality-eval.yml
name: Figma Quality Eval
on:
  workflow_dispatch:                     # manual trigger
  schedule:
    - cron: '0 8 * * 1'                 # every Monday 8:00 UTC
jobs:
  figma-eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm run eval -- --pipeline=figma_quality
        env:
          FIGMA_ACCESS_TOKEN: ${{ secrets.FIGMA_ACCESS_TOKEN }}
      # Posts report to Slack #engineering or as GitHub issue
# Does NOT block merges — blocks implementation.
# Figma has no PRs/webhooks. Gate is pre-implementation, not pre-merge.
Acceptance CriteriaCriterios de Aceptación
  • [Ph 1] Golden dataset has 20-30 curated cases covering fees, metrics, scope, and tool activation
  • [Ph 1] CoachEvalRunner + AnthropicLLMJudge score each response with relevance (0.3), accuracy (0.4), tone (0.15), actionability (0.15)
  • [Ph 1] CI gate blocks merge when passRate < minPassRate OR regressionDelta exceeds maxRegressionDelta
  • [Ph 1] EvalReport published as PR comment with per-case scores and judge rationale
  • [Ph 1] Baseline stored for regression comparison across runs
  • [Ph 2] ContractEvalRunner validates Tool Registry ↔ Data Sync contract (request/response schemas + SLA)
  • [Ph 2] ContractEvalRunner validates Tool Registry ↔ Enrichment contract
  • [Ph 2] Contract tests integrated in CI/CD of core-knowledge-data-synchronizator and core-knowledge-enrichment
  • [Ph 3] KBQualityRunner validates chunk retrievability for expected queries
  • [Ph 3] E2E flows: proposal → confirmation → action → response validated end-to-end
  • [Ph 3] Multi-project regression suite: cross-project regressions in a single report
  • [Ph 4] DesktopBuildRunner validates macOS (arm64+x64) and Windows (x64) builds with 11 checks per platform
  • [Ph 4] Code signing verified: macOS codesign + Apple notarization, Windows Authenticode
  • [Ph 4] App startup validated <5s, bundle size <250MB, native modules load, IPC channels respond
  • [Ph 4] Desktop build eval runs on native runners (macos-14 + windows-latest) only for PRs touching desktop-client
  • [Ph 5] FigmaQualityRunner validates 15 checks against Design System requirements via Figma REST API
  • [Ph 5] Critical checks (9) block implementation: variable_architecture, code_syntax, auto_layout, color_hardcoding, naming, states, light_dark, wcag_contrast, mcp_compatibility
  • [Ph 5] Figma eval runs weekly on schedule + on-demand. Reporte identifies component, check, node, and suggestion per violation
  • [Ph 1] Golden dataset tiene 20-30 casos curados cubriendo fees, métricas, scope y activación de tools
  • [Ph 1] CoachEvalRunner + AnthropicLLMJudge puntúan cada respuesta con relevancia (0.3), precisión (0.4), tono (0.15), accionabilidad (0.15)
  • [Ph 1] Gate CI bloquea merge cuando passRate < minPassRate O regressionDelta excede maxRegressionDelta
  • [Ph 1] EvalReport publicado como comentario en PR con scores por caso y rationale del judge
  • [Ph 1] Baseline almacenado para comparación de regresiones entre runs
  • [Ph 2] ContractEvalRunner valida contrato Tool Registry ↔ Data Sync (schemas request/response + SLA)
  • [Ph 2] ContractEvalRunner valida contrato Tool Registry ↔ Enrichment
  • [Ph 2] Contract tests integrados en CI/CD de core-knowledge-data-synchronizator y core-knowledge-enrichment
  • [Ph 3] KBQualityRunner valida recuperabilidad de chunks para queries esperadas
  • [Ph 3] Flujos E2E: propuesta → confirmación → acción → respuesta validados end-to-end
  • [Ph 3] Suite de regresión multi-proyecto: regresiones cross-proyecto en un solo reporte
  • [Ph 4] DesktopBuildRunner valida builds macOS (arm64+x64) y Windows (x64) con 11 checks por plataforma
  • [Ph 4] Firma de código verificada: macOS codesign + notarización Apple, Windows Authenticode
  • [Ph 4] Arranque <5s, bundle <250MB, módulos nativos cargan, canales IPC responden
  • [Ph 4] Desktop build eval corre en runners nativos (macos-14 + windows-latest) solo para PRs que tocan desktop-client
  • [Ph 5] FigmaQualityRunner valida 15 checks contra requisitos del Design System via Figma REST API
  • [Ph 5] Checks críticos (9) bloquean implementación: variable_architecture, code_syntax, auto_layout, color_hardcoding, naming, states, light_dark, wcag_contrast, mcp_compatibility
  • [Ph 5] Figma eval corre semanalmente en schedule + bajo demanda. Reporte identifica componente, check, nodo, y sugerencia por violación

CI/CD only · Not runtime · 7 pipelines (llm_judge, contract, kb_quality, e2e, desktop_build, figma_quality, api_monitor) · Judge weights: relevance 0.3, accuracy 0.4, tone 0.15, actionability 0.15 · Consumer-driven contracts · 11 desktop checks (native runners) · 15 Figma checks (REST API, weekly) · Baseline regression tracking

How It WorksComo Funciona

Developer opens a PR to conversation-api
        ↓
GitHub Actions triggers eval suite
        ↓
EvalRunner runs 30 golden cases against new Coach version
  Judge LLM scores each response: relevance · accuracy · tone · actionability
        ↓
Compares with baseline of previous version
        ↓
+2% improvement vs baseline
→ merge allowed, report published as PR comment

-7% regression vs baseline
→ merge blocked, report shows which cases failed and why

───────────────────────────────────────

PIPELINE: contract (Phase 2)
Tool Registry defines: "I expect ProductMetrics with visits7d, sales7d, conversionRate"
Data Sync changes response → contract test fails BEFORE deploy
→ Provider can't break consumer without knowing

PIPELINE: kb_quality (Phase 3)
Query: "fees for Electronics in MercadoLibre"
→ KBQualityRunner checks: are there retrievable chunks covering this?
→ If not → knowledge gap detected before production

—————————————————————

PIPELINE: desktop_build (Phase 4)
Developer opens PR to core-product-desktop-client
        ↓
Two jobs in parallel:
  → macOS runner: build arm64+x64, code sign, notarize, 11 checks
  → Windows runner: build x64, Authenticode sign, 11 checks
        ↓
Each job validates: compilation, signing, startup <5s, bundle <250MB,
  native modules, IPC channels, window rendering, deep links
        ↓
All green → merge allowed
Something red → merge blocked + report with failed check and platform

—————————————————————

PIPELINE: figma_quality (Phase 5)
UX/UI team publishes a library in Figma
        ↓
Trigger: manual or weekly cron (every Monday 8:00 UTC)
        ↓
FigmaQualityRunner reads files via Figma REST API
        ↓
15 automated checks:
  variables ✓ | Code Syntax ✓ | Auto Layout ✓ | naming ✓ | states ✓
  color hardcoding ✓ | spacing ✓ | semantic aliasing ✓ | Light/Dark ✓
  WCAG contrast ✓ | MCP compatibility ✓ | descriptions | covers | hidden
        ↓
95%+ compliance → implementation allowed
Critical violations → agent does NOT implement until fixed
  e.g. "Button/Primary/Default: fill #3B82F6 is hardcoded hex
        → bind to variable color/interactive/primary"
            

The Eval Suite runs entirely in CI/CD — it never touches production. It evaluates the Coach, validates API contracts, checks KB quality, validates desktop builds as distributable artifacts, and audits Figma for MCP compatibility. Its users are PRs, CI/CD pipelines, and internal quality reports — not sellers. It cannot live inside any project it evaluates — it needs to be above them, without depending on their release cycle or internal architecture.El Eval Suite corre enteramente en CI/CD — nunca toca producción. Evalúa el Coach, valida contratos de API, chequea calidad de KB, valida builds de escritorio como artefactos distribuibles, y audita el Figma para compatibilidad MCP. Sus usuarios son PRs, pipelines de CI/CD, y reportes de calidad internos — no vendedores. No puede vivir dentro de ningún proyecto que evalúa — necesita estar por encima de ellos, sin depender de su ciclo de release ni de su arquitectura interna.

Implementation Plan (5 Phases)Plan de Implementación (5 Fases)

Phase 1: LLM Judge for the CoachFase 1: LLM Judge para el Coach

Golden dataset v1: 20-30 cases (fees, sales metrics, scope, tool activation). CoachEvalRunner + AnthropicLLMJudge. CI/CD integration in conversation-api as merge gate. Baseline stored for regression comparison.Golden dataset v1: 20-30 casos (fees, métricas de ventas, scope, activación de tools). CoachEvalRunner + AnthropicLLMJudge. Integración CI/CD en conversation-api como gate de merge. Baseline almacenado para comparación de regresiones.

Phase 2: Contract TestingFase 2: Contract Testing

Contracts Tool Registry ↔ Data Sync. Contracts Tool Registry ↔ Enrichment. ContractEvalRunner + schemas in datasets/contracts/. CI/CD integrated in core-knowledge-data-synchronizator and core-knowledge-enrichment.Contratos Tool Registry ↔ Data Sync. Contratos Tool Registry ↔ Enrichment. ContractEvalRunner + schemas en datasets/contracts/. CI/CD integrado en core-knowledge-data-synchronizator y core-knowledge-enrichment.

Phase 3: KB Quality + E2E ShellFase 3: KB Quality + E2E Shell

KBQualityRunner: validates KB chunks are retrievable for expected queries. E2E Shell flows: proposal → confirmation → action → response. Multi-project regression suite: cross-project regressions in a single report.KBQualityRunner: valida que chunks de KB son recuperables para queries esperadas. Flujos E2E Shell: propuesta → confirmación → acción → respuesta. Suite de regresión multi-proyecto: regresiones cross-proyecto en un solo reporte.

Phase 4: Desktop Build EvalFase 4: Eval de Builds de Escritorio

DesktopBuildRunner for macOS (arm64+x64) and Windows (x64). GitHub Actions with native runners per platform. 11 checks: compilation, code signing, notarization, app startup, bundle size, native modules, auto-updater, deep links, window rendering, IPC channels. Merge gate for PRs to core-product-desktop-client.DesktopBuildRunner para macOS (arm64+x64) y Windows (x64). GitHub Actions con runners nativos por plataforma. 11 checks: compilación, firma de código, notarización, arranque, tamaño del bundle, módulos nativos, auto-updater, deep links, renderizado de ventana, canales IPC. Gate de merge para PRs a core-product-desktop-client.

Phase 5: Figma Quality EvalFase 5: Eval de Calidad del Figma

FigmaQualityRunner + FigmaRESTClient. 15 checks against Design System requirements (doc 72). Scheduled weekly + on-demand + pre-implementation gate. Report published to Slack #engineering or as GitHub issue. Each violation identifies: component, check, node, and actionable suggestion.FigmaQualityRunner + FigmaRESTClient. 15 checks contra requisitos del Design System (doc 72). Semanal + bajo demanda + gate de pre-implementación. Reporte publicado en Slack #engineering o como issue de GitHub. Cada violación identifica: componente, check, nodo, y sugerencia accionable.

Risk AnalysisAnalisis de Riesgos

Judge LLM inconsistent scoringPuntuacion inconsistente del Judge LLM

Impact: MImpacto: M

Mitigation: criteria are specific and verifiable, not subjective. "Response includes exact fee percentage" is more stable than "response is useful". For critical cases, Claude Sonnet provides higher consistency.Mitigacion: criterios son especificos y verificables, no subjetivos. "La respuesta incluye el porcentaje exacto del fee" es mas estable que "la respuesta es util". Para casos criticos, Claude Sonnet provee mayor consistencia.

Golden dataset becomes staleGolden dataset se vuelve stale

Impact: MImpacto: M

Mitigation: golden datasets are versioned code — PRs go through review. Every feature PR includes golden cases, every bug fix becomes a permanent regression test. Quality depends on dataset quality, not volume.Mitigacion: golden datasets son codigo versionado — los PRs pasan por review. Cada PR de feature incluye golden cases, cada bug fix se convierte en test de regresion permanente. La calidad depende de la calidad del dataset, no del volumen.

Eval suite too slow for CIEval suite demasiado lento para CI

Impact: LImpacto: B

Mitigation: parallelism in EvalRunner (10 concurrent). Lightweight Judge (Claude Haiku). Initial dataset small (20-30 cases). Target: <5 minutes per pipeline.Mitigacion: paralelismo en EvalRunner (10 concurrentes). Judge ligero (Claude Haiku). Dataset inicial pequeno (20-30 casos). Objetivo: <5 minutos por pipeline.

Contract schema driftDrift de schemas de contrato

Impact: MImpacto: M

Mitigation: consumer-driven — the consumer defines expectations. If the provider changes its response, the contract fails in the provider’s CI. The provider must update the contract explicitly.Mitigación: consumer-driven — el consumidor define expectativas. Si el proveedor cambia su respuesta, el contrato falla en el CI del proveedor. El proveedor debe actualizar el contrato explícitamente.

macOS runner costCosto de runner macOS

Impact: MImpacto: M

Mitigation: macOS runners are ~10x more expensive than Linux. Desktop build eval only triggers on PRs that touch core-product-desktop-client — not on every PR. Cost estimated at ~$5-15/month with typical PR volume.Mitigación: runners macOS son ~10x más caros que Linux. Desktop build eval solo se dispara en PRs que tocan core-product-desktop-client — no en cada PR. Costo estimado ~$5-15/mes con el volumen típico de PRs.

Figma API rate limitsRate limits de la API de Figma

Impact: LImpacto: B

Mitigation: Figma REST API has generous rate limits for reading files. Weekly schedule + on-demand keeps request volume low. Personal Access Token stored as GitHub Actions secret.Mitigación: la API REST de Figma tiene rate limits generosos para lectura de archivos. Schedule semanal + bajo demanda mantiene el volumen de requests bajo. Personal Access Token almacenado como secret de GitHub Actions.

Key DecisionsDecisiones Clave

D1.

Does not live in conversation-api — Today it evaluates the Coach; tomorrow it evaluates the entire stack. A project that evaluates multiple projects cannot live inside one of them. It has its own lifecycle, golden dataset, and CI infrastructure.No vive en conversation-api — Hoy evalua el Coach; manana evalua todo el stack. Un proyecto que evalua multiples proyectos no puede vivir dentro de uno de ellos. Tiene su propio ciclo de vida, golden dataset, e infraestructura CI.

D2.

LLM-as-judge, not rules — Scoring rules break when the Coach evolves. The Judge LLM evaluates semantic quality — relevance, accuracy, tone, actionability — like a human reviewer would. The golden dataset criteria are the rubric; the Judge applies judgment.LLM-as-judge, no reglas — Las reglas de scoring se rompen cuando el Coach evoluciona. El Judge LLM evalua calidad semantica — relevancia, precision, tono, accionabilidad — igual que lo haria un revisor humano. Los criterios del golden dataset son la rubric; el Judge aplica criterio.

D3.

Golden datasets are versioned code — Evaluation cases live in the repo as JSON/YAML files. PRs to datasets go through review just like code. Eval quality depends on dataset quality, not volume.Golden datasets son codigo versionado — Los casos de evaluacion viven en el repo como archivos JSON/YAML. Los PRs a los datasets pasan por review igual que el codigo. La calidad del eval depende de la calidad del dataset, no de su volumen.

D4.

Consumer-driven contracts — The consumer (Tool Registry) defines what it expects from the provider (Data Sync). Not the other way around. If the provider changes its contract and breaks the consumer, the test fails before the provider’s deploy. This inverts responsibility: whoever changes must prove they didn’t break anyone.Contratos consumer-driven — El consumidor (Tool Registry) define qué espera del proveedor (Data Sync). No al revés. Si el proveedor cambia su contrato y rompe al consumidor, el test falla antes del deploy del proveedor. Esto invierte la responsabilidad: quien cambia demuestra que no rompió a nadie.

D5.

Desktop builds need native runners — Code signing, notarization, and native modules (keytar, better-sqlite3) are OS-specific. Cannot validate a macOS build on Linux. macOS runners are ~10x more expensive — mitigated by only triggering on PRs that touch core-product-desktop-client.Builds de escritorio necesitan runners nativos — Firma de código, notarización, y módulos nativos (keytar, better-sqlite3) son específicos del OS. No se puede validar un build de macOS en Linux. Runners macOS son ~10x más caros — se mitiga corriendo solo en PRs que tocan core-product-desktop-client.

D6.

Figma eval uses REST API, not MCP — MCP is for interactive agent use (when Claude implements components). Automated evaluation in CI needs a programmatic client calling the Figma REST API directly. MCP is reserved for manual diagnosis and on-demand pre-implementation checks.Figma eval usa API REST, no MCP — MCP es para uso interactivo del agente (cuando Claude implementa componentes). La evaluación automatizada en CI necesita un cliente programático que llame a la API REST de Figma directamente. MCP se reserva para diagnóstico manual y checks de pre-implementación bajo demanda.

D7.

Figma eval is scheduled, not PR-triggered — Figma has no PRs or library-publish webhooks. The pipeline runs on schedule (weekly) or on demand. It does not block merges of code — it blocks implementation: the agent must not implement a component that doesn’t pass quality checks.Figma eval es programado, no PR-triggered — Figma no tiene PRs ni webhooks de publicación de librería. El pipeline corre en schedule (semanal) o bajo demanda. No bloquea merges de código — bloquea implementación: el agente no debe implementar un componente que no pasa los checks de calidad.

D8.

Figma checks come from doc 72, not invented — Each of the 15 checks is mapped to a specific requirement from Design System Internals (doc 72). If a requirement changes in doc 72, the check updates. The Eval Framework does not define what Figma should have — the Design System defines it, the Eval Framework verifies it.Los checks del Figma vienen del doc 72, no son inventados — Cada uno de los 15 checks está mapeado a un requisito específico del Design System Internals (doc 72). Si el requisito cambia en el doc 72, el check se actualiza. El Eval Framework no define qué debe tener el Figma — el Design System lo define, el Eval Framework lo verifica.

File StructureEstructura de Archivos

core-quality-stack-evaluation/
│── src/
│   │── domain/
│   │   │── interfaces/
│   │   │   │── IEvalPipeline.ts
│   │   │   │── ILLMJudge.ts
│   │   │   │── IFigmaAPIClient.ts
│   │   │   ├── IGoldenDatasetManager.ts
│   │   ├── models/
│   │       │── EvalConfig.ts
│   │       │── EvalReport.ts
│   │       │── GoldenDataset.ts
│   │       │── LLMJudgeScore.ts
│   │       │── DesktopBuildReport.ts
│   │       ├── FigmaQualityReport.ts
│   │── application/
│   │   │── EvalRunner.ts
│   │   │── LLMJudge.ts
│   │   │── ContractTester.ts
│   │   ├── ReportGenerator.ts
│   ├── infrastructure/
│       │── runners/
│       │   │── CoachEvalRunner.ts
│       │   │── ContractEvalRunner.ts
│       │   │── KBQualityRunner.ts
│       │   │── DesktopBuildRunner.ts
│       │   ├── FigmaQualityRunner.ts
│       │── judge/
│       │   ├── AnthropicLLMJudge.ts
│       ├── figma/
│           ├── FigmaRESTClient.ts
│── datasets/
│   │── coach/           ← golden cases (JSON/YAML, versioned)
│   │── kb/              ← KB quality cases
│   │── contracts/       ← contract definitions between projects
│   │── desktop/         ← build check config + thresholds per platform
│   ├── figma/           ← file keys, checks enabled, thresholds per file
│── cli/
│   ├── eval.ts          ← npm run eval -- --pipeline=<type> [--platform=<os>]
├── .github/
    ├── workflows/
        │── eval-on-pr.yml
        │── desktop-build-eval.yml
        ├── figma-quality-eval.yml

MVP Scope

Phase 1: Golden dataset (20-30 cases) + CoachEvalRunner + AnthropicLLMJudge + CI gate. Not runtime — CI/CD only. Expands across 5 phases: Coach quality, API contracts, KB+E2E, desktop builds (macOS+Windows), and Figma quality (15 checks via REST API). Fase 1: Golden dataset (20-30 casos) + CoachEvalRunner + AnthropicLLMJudge + gate CI. No es runtime — solo CI/CD. Se expande en 5 fases: calidad del Coach, contratos API, KB+E2E, builds de escritorio (macOS+Windows), y calidad del Figma (15 checks via API REST).

Inspired byInspirado en

Pact (consumer-driven contracts), DeepEval, Anthropic eval best practices Pact (contratos consumer-driven), DeepEval, mejores practicas de eval de Anthropic

Source:Fuente: New projectProyecto nuevo | Depends on:Depende de: #2 (ILLMClient, staging) · Ph2: #10, #3, #11 · Ph3: #9, #1 · Ph4: #1 (desktop builds) · Ph5: #18 (Figma files)
📝 Project ChangelogChangelog del Proyecto
v5 Mar 10, 2026
+2 new evaluation pipelines: desktop_build (Electron macOS+Windows, 11 checks) and figma_quality (15 checks via Figma REST API)2 nuevos pipelines de evaluación: desktop_build (Electron macOS+Windows, 11 checks) y figma_quality (15 checks via Figma REST API)
~Pipelines: 5 → 7 (added desktop_build, figma_quality)Pipelines: 5 → 7 (agregados desktop_build, figma_quality)
~Implementation phases: 3 → 5 (added Phase 4 Desktop Build, Phase 5 Figma Quality)Fases de implementación: 3 → 5 (agregadas Fase 4 Desktop Build, Fase 5 Figma Quality)
+Components: +3 (DesktopBuildRunner, FigmaQualityRunner, FigmaRESTClient)Componentes: +3 (DesktopBuildRunner, FigmaQualityRunner, FigmaRESTClient)
+Desktop build: 11 checks per platform (compilation, signing, notarization, startup, bundle, native modules, IPC, deep links)Desktop build: 11 checks por plataforma (compilación, firma, notarización, arranque, bundle, módulos nativos, IPC, deep links)
+Figma quality: 15 checks (variable architecture, Code Syntax, Auto Layout, naming, states, color/spacing hardcoding, WCAG, MCP compatibility). 9 critical + 6 warningsFigma quality: 15 checks (arquitectura de variables, Code Syntax, Auto Layout, naming, states, color/spacing hardcodeado, WCAG, compatibilidad MCP). 9 críticos + 6 warnings
+CI workflows: +2 (desktop-build-eval.yml with native macOS/Windows runners, figma-quality-eval.yml with weekly cron)CI workflows: +2 (desktop-build-eval.yml con runners nativos macOS/Windows, figma-quality-eval.yml con cron semanal)
+Key decisions: +4 (D5 native runners, D6 REST API not MCP, D7 scheduled not PR-triggered, D8 checks from doc 72)Decisiones clave: +4 (D5 runners nativos, D6 API REST no MCP, D7 programado no PR-triggered, D8 checks del doc 72)
~“What this project does NOT do”: 6 → 10 items (added: implement React from Figma, design Figma, distribute builds, run unit tests)“Lo que este proyecto NO hace”: 6 → 10 items (agregados: implementar React desde Figma, diseñar Figma, distribuir builds, correr unit tests)
v4 Mar 2, 2026
+New project created as “Eval Framework / LLM-as-Judge”Proyecto nuevo creado como “Eval Framework / LLM-as-Judge”
~Renamed “Eval Framework / LLM-as-Judge” → “Eval Suite” (reflects full scope)Renombrado “Eval Framework / LLM-as-Judge” → “Eval Suite” (refleja scope completo)
+4 evaluation pipelines: llm_judge (Coach), contract (API contracts), kb_quality (chunks), e2e (Shell flows)4 pipelines de evaluacion: llm_judge (Coach), contract (contratos API), kb_quality (chunks), e2e (flujos Shell)
+Sandbox isolation: Lambda directa, KB snapshot fijo, DynamoDB de testAislamiento sandbox: Lambda directa, KB snapshot fijo, DynamoDB de test
+LLM Judge: 4 dimensions (relevance 0.3, accuracy 0.4, tone 0.15, actionability 0.15)LLM Judge: 4 dimensiones (relevancia 0.3, precision 0.4, tono 0.15, accionabilidad 0.15)
+Consumer-driven contract testing + CI/CD workflow (eval-on-pr.yml)Contract testing consumer-driven + workflow CI/CD (eval-on-pr.yml)
~Components: 3 generic → 6 specific (EvalRunner, LLMJudge, ContractTester, ReportGenerator, CoachEvalRunner, KBQualityRunner)Componentes: 3 genericos → 6 especificos (EvalRunner, LLMJudge, ContractTester, ReportGenerator, CoachEvalRunner, KBQualityRunner)
~Implementation: 4 phases → 3 phases (LLM Judge, Contracts, KB+E2E)Implementacion: 4 fases → 3 fases (LLM Judge, Contracts, KB+E2E)
🤖

Layer 7 — INTERNALCapa 7 — INTERNO

How the team worksCómo trabaja el equipo

+
#17

Beautonomous

Internal — Pablo — Zero code. Zero infrastructure. Config only.Cero código. Cero infraestructura. Solo configuración.

CONFIG

The internal operating agent of the Shopilot team. Lives in OpenClaw UI — the team opens the core-internal-team-workflow project and works from there. Slack receives proactive notifications and pipeline approvals directly, without opening any other tool.El agente operativo interno del equipo Shopilot. Vive en OpenClaw UI — el equipo abre el proyecto core-internal-team-workflow y trabaja desde ahí. Slack recibe las notificaciones proactivas y las aprobaciones del pipeline directamente, sin abrir ninguna otra herramienta.

4 engineers operating as 10–15. The problem is not technical capacity — it’s operational fragmentation: to know what’s happening you have to go to Linear, GitHub, and Slack separately; simple changes require interrupting someone; there’s no centralized place to approve changes or trigger reviews. Beautonomous solves this from OpenClaw UI (main interface: full conversation, context, history, all tools, automatic role auth) and Slack (second native channel: direct conversation, proactive notifications, pipeline approvals). 4 native OAuth connectors (GitHub · Linear · Code · Slack) + 3 governance roles (El Capitán / El Mago / El Artesano) + a Quality Gate that runs automatically on every PR across all 11 repos. The only code to write: the script that calls Claude Code via API inside quality-gate.yml — written once, replicated from a template in core-internal-team-workflow/templates/. 4 ingenieros operando como 10–15. El problema no es la capacidad técnica — es la fragmentación operativa: para saber qué está pasando hay que ir a Linear, GitHub y Slack por separado; los cambios simples requieren interrumpir a alguien; no hay un lugar centralizado para aprobar cambios o disparar reviews. Beautonomous lo resuelve desde OpenClaw UI (interfaz principal: conversación completa, contexto, historial, todas las herramientas, auth automática por rol) y Slack (segundo canal nativo: conversación directa, notificaciones proactivas, aprobaciones del pipeline). 4 conectores OAuth nativos (GitHub · Linear · Code · Slack) + 3 roles de gobernanza (El Capitán / El Mago / El Artesano) + un Quality Gate que corre automáticamente en cada PR de los 11 repos. El único código que hay que escribir: el script que invoca Claude Code vía API dentro del quality-gate.yml — se escribe una vez y se replica desde un template en core-internal-team-workflow/templates/.

OpenClaw UI
Main interface — full context + historyInterfaz principal — contexto + historial
Slack
Second native channel — alerts + approvalsSegundo canal nativo — alertas + aprobaciones
GitHub Connector
Repos, PRs, Issues, Actions — 10 toolsRepos, PRs, Issues, Actions — 10 herramientas
Linear Connector
Tasks, sprints, assignments — 9 toolsTareas, sprints, asignaciones — 9 herramientas
Governance Engine
3 roles + risk taxonomy + audit log3 roles + taxonomía de riesgo + audit log
Quality Gate
lint + tests + architecture review per PRlint + tests + architecture review por PR
Code Connector
Read + propose changes via PR — 7 toolsLectura + proponer cambios via PR — 7 herramientas
Bootstrap Templates
CLAUDE.md + .claudeignore + settings.json + MEMORY.md + specs/ + skills/ + quality-gate.yml

Configuration Stack — the only “code”: quality-gate.yml script (written once, replicated from template)Stack de Configuración — el único “código”: script quality-gate.yml (se escribe una vez, se replica desde template)

OpenClaw (platform) GitHub OAuth Linear OAuth Slack OAuth GitHub Actions CLAUDE.md × 11 repos MEMORY.md × 11 repos Zero AWS / Zero CDK Zero TypeScript

Beautonomous depends on: OpenClaw account + GitHub org beautonomous + Linear workspace AUT + Slack workspace beautonomous. All other projects (#1–#16) depend on Beautonomous being operational first.Beautonomous depende de: cuenta OpenClaw + org GitHub beautonomous + workspace Linear AUT + workspace Slack beautonomous. Todos los demás proyectos (#1–#16) dependen de que Beautonomous esté operacional primero.

Architecture, Quality Gate, Bootstrap, Governance & System Prompt Arquitectura, Quality Gate, Bootstrap, Gobernanza & System Prompt
ArchitectureArquitectura
┌──────────────────────────────┐   ┌──────────────────────────────┐
│  OPENCLAW UI                 │   │  SLACK                       │
│  Interfaz principal          │   │  Segundo canal nativo        │
│  Conversación + historial    │   │  Notificaciones proactivas   │
│  Todas las herramientas      │   │  Aprobaciones del pipeline   │
│  Auth automática por rol     │   │  Alertas de CI/CD            │
└──────────────┬───────────────┘   └───────────────┬──────────────┘
               │                                   │
               └────────────┬────────────────────────┘
                              │
                   Terminal / Claude Code
                   (operaciones técnicas directas)
                              │
┌─────────────────────────▼────────────────────────────────┐
│  OPENCLAW — Motor del agente                                    │
│  ReAct Loop · Governance Guard · Audit Log                      │
│  Auth: identifica rol automáticamente por usuario logueado      │
│  Conectores: GitHub · Linear · Code · Slack                     │
└─────────────────────────┬────────────────────────────────┘
                              │ invoca via API / GitHub Actions
┌─────────────────────────▼────────────────────────────────┐
│  ESTRUCTURA BASE DE CALIDAD — por cada repositorio del stack    │
│  ├── CLAUDE.md          instrucciones + convenciones del repo   │
│  ├── .claude/memory/    contexto persistente                    │
│  └── quality-gate.yml   GitHub Action: lint + tests + review   │
└─────────────────────────────────────────────────────────────────┘
4 Capabilities (v1)4 Capacidades (v1)

1. See status from SlackVer status desde Slack

Any member asks in Slack and gets a synthesized response from GitHub, Linear and Slack — without opening other tools. Daily summary auto-published in #team at 9:00 AM: pending PRs, failing workflows, tasks in progress per person, active blockers.Cualquier miembro pregunta en Slack y obtiene una respuesta sintetizada desde GitHub, Linear y Slack — sin abrir otras herramientas. Resumen diario automático en #team a las 9:00 AM: PRs pendientes, workflows fallando, tareas en progreso por persona, bloqueos activos.

2. Create and manage tasks from SlackCrear y gestionar tareas desde Slack

Create tasks, assign them, change status and add comments in Linear — from Slack, without opening Linear.Crear tareas, asignarlas, cambiar estado y agregar comentarios en Linear — desde Slack, sin abrir Linear.

3. Approve PRsAprobar PRs

When a PR passes the quality gate, Beautonomous notifies Mateo (El Mago) with the PR summary, diff and automatic review result. Mateo can respond from OpenClaw UI or directly from the Slack DM — wherever he is at that moment. If the PR goes to production, the same flow reaches Pablo after Mateo approves. The team doesn’t need to enter GitHub to approve — the decision happens where the approver is, the merge and deploy happen automatically.Cuando un PR pasa el quality gate, Beautonomous notifica a Mateo con el resumen del PR, el diff y el resultado de la revisión automática. Mateo puede responder desde OpenClaw UI o directamente desde el DM en Slack — donde esté en ese momento. Si el PR va a producción, el mismo flujo llega a Pablo después de que Mateo aprueba. El equipo no necesita entrar a GitHub para aprobar — la decisión ocurre donde el aprobador esté, el merge y el deploy ocurren automáticamente.

4. Activate the quality agentActivar el quality agent

The quality gate runs automatically on every PR. Also activatable manually from OpenClaw UI, terminal or Slack to review any repo at any time. Review includes cross-repo contract validation: if a PR breaks an interface another project consumes, the quality gate detects it and fails with the specific reason. Contracts live in the CLAUDE.md of each repo.El quality gate corre automáticamente en cada PR. También puede activarse manualmente desde OpenClaw UI, terminal o Slack para revisar cualquier repo. La revisión incluye validación de contratos entre repos: si un PR rompe una interfaz que otro proyecto consume, el quality gate lo detecta y falla con la razón específica. Los contratos viven en el CLAUDE.md de cada repo.

Quality Gate — 5 Sequential StepsQuality Gate — 5 Pasos Secuenciales

Runs automatically on every PR to develop or main, and manually from Slack. Steps are sequential — if any fails, the PR does not advance. If step 0 fails, Beautonomous notifies in #deploys with missing files and bootstrap instructions.Corre automáticamente en cada PR hacia develop o main, y manualmente desde Slack. Los pasos son secuenciales — si cualquiera falla, el PR no avanza. Si el paso 0 falla, Beautonomous notifica en #deploys con los archivos faltantes e instrucciones de bootstrap.

StepPaso ToolHerramienta What it detectsQué detecta
0. Base structureShell scriptRequired files present (CLAUDE.md, .claudeignore, settings.json, MEMORY.md, specs/, skills/)Archivos requeridos presentes (CLAUDE.md, .claudeignore, settings.json, MEMORY.md, specs/, skills/)
1. Lint + typesESLint + tsc / ruffSyntax errors, incorrect typesErrores de sintaxis, tipos incorrectos
2. TestsJest / pytestBroken tests, coverage below minimum defined in CLAUDE.mdTests rotos, cobertura bajo el mínimo definido en CLAUDE.md
3. Architecture reviewClaude Code via APIClean Architecture boundary violations, broken contracts between reposViolaciones de boundaries de Clean Architecture, contratos rotos entre repos
4. Convention checkClaude Code via APINaming, folder structure, repo-specific patternsNaming, estructura de carpetas, patrones específicos del repo

Steps 3 and 4 receive full context: CLAUDE.md + MEMORY.md + .claude/specs/architecture.md + .claude/specs/contracts.md + PR diff + repo skills. Output: structured JSON with passed/failed checks and actionable issues per file/line.Los pasos 3 y 4 reciben contexto completo: CLAUDE.md + MEMORY.md + .claude/specs/architecture.md + .claude/specs/contracts.md + diff del PR + skills del repo. Output: JSON estructurado con checks aprobados/fallidos e issues accionables por archivo/línea.

Bootstrap — Complete .claude/ per RepoBootstrap — .claude/ completo por repo

El Mago runs the bootstrap by copying templates from core-internal-team-workflow/templates/ and filling in the repo-specific context. Without bootstrap, the agent operates without context and the quality gate fails at step 0.El Mago ejecuta el bootstrap copiando los templates de core-internal-team-workflow/templates/ y rellenando el contexto específico del repo. Sin bootstrap el agente opera sin contexto y el quality gate falla en el paso 0.

repo/
├── CLAUDE.md                        # instrucciones + convenciones del repo
├── .claudeignore                    # archivos que Claude no debe leer
└── .claude/
    ├── settings.json                # permisos + hook PostToolUse (build:check)
    ├── memory/
    │   └── MEMORY.md                # contexto persistente del repo
    ├── specs/
    │   ├── architecture.md          # decisiones + boundaries
    │   ├── contracts.md             # contratos con otros repos
    │   └── testing.md               # qué testear y cómo
    └── skills/ (symlinks)
        ├── clean-ddd-hexagonal      # todos los repos
        ├── solid                    # todos los repos
        └── clean-architecture       # todos los repos
+ .github/workflows/quality-gate.yml    # GitHub Action: lint + tests + Claude Code review

PostToolUse hook in settings.json runs build:check automatically after every edit — the agent sees TypeScript errors immediately without being asked. core-intelligence-conversation-api has 11 skills installed and serves as the reference repo for bootstrap.El hook PostToolUse en settings.json corre build:check automáticamente después de cada edición — el agente ve los errores de TypeScript de inmediato. core-intelligence-conversation-api tiene 11 skills instalados y sirve como repo de referencia para el bootstrap.

Cross-repo ContractsContratos Entre Repos

A contract is any interface or agreement between two projects that, if changed in one, breaks the other. Contracts live in CLAUDE.md under a standard “Contratos con otros repos” section. The quality gate reads them in every PR to detect breaks. El Mago updates them when an integration is designed or changed — not automatic, it’s an architecture decision.Un contrato es cualquier interfaz o acuerdo entre dos proyectos que, si cambia en uno, rompe el otro. Los contratos viven en CLAUDE.md bajo una sección estándar. El quality gate los lee en cada PR para detectar rupturas. El Mago los actualiza cuando se diseña o cambia una integración — no es automático, es una decisión de arquitectura.

## Contratos con otros repos

### Expone (otros repos dependen de esto)
- ICreditsGate.canProceed({ userId, toolCategory }) → { allowed, reason }
  Consumidor: core-intelligence-conversation-api
  Rompe si: cambia la firma, cambia el significado de `allowed`, se elimina

### Consume (este repo depende de esto)
- POST /internal/gate (core-platform-billing)
  Rompe si: cambia el path, cambia el body schema, cambia los status codes
Permission MatrixMatriz de Permisos
ActionAcción El Capitán El Mago El Artesano
View team status (GitHub / Linear / Slack)Ver estado del equipo (GitHub / Linear / Slack)
Read code (all repos)Consultar código (lectura total)
Create tasks in LinearCrear tareas en Linear
Assign tasks to anyoneAsignar tareas a cualquier personaOwn onlySolo propias
Send messages to Slack channelsEnviar mensajes a canales de Slack✅ (conf.)
Trigger staging workflowDisparar workflow (staging)
Trigger production workflowDisparar workflow (producción)✅ + conf.
Propose UI code changes (generates PR)Proponer cambios de código UI (genera PR)
Propose backend logic changes (generates PR)Proponer cambios de lógica backend (genera PR)
Infra / critical config changesCambios de infra / configuración crítica✅ + conf.
Approve agent-generated PRsAprobar PRs generados por el agente
Approve deploy to productionAprobar deploy a producción✅ (final)✅ (técn.)
Manage roles in BeautonomousGestionar roles en Beautonomous
System Prompt Base (OpenClaw) — ready to pasteSystem Prompt Base (OpenClaw) — listo para pegar
# Beautonomous — Agente Operativo Interno de Shopilot

Eres el agente operativo del equipo. Tu función: dar visibilidad completa
del proyecto y ejecutar acciones en GitHub, Linear, Slack y el código.
Operas desde OpenClaw UI (interfaz principal), Slack (notificaciones y
aprobaciones) y terminal. El rol del usuario ya viene determinado por
OpenClaw — nunca lo asumas ni lo pidas explícitamente.

## Usuario actual
{USER_NAME} | {USER_EMAIL} | Rol: {USER_ROLE}

## Roles
El Capitán (pablo@shopilot.ai):
  - Lectura total de GitHub, Linear y Slack
  - Crear y asignar tareas en Linear
  - Solicitar cambios de UI (genera PR, El Mago aprueba)
  - Aprobación final de negocio para deploys a producción
  - NO puede disparar workflows ni tocar código backend/infra

El Mago (mateo@shopilot.ai):
  - Acceso completo a todos los sistemas
  - Aprobar y rechazar PRs del agente
  - Disparar cualquier workflow (siempre con confirmación previa)
  - Enviar mensajes a Slack en nombre del equipo
  - Modificar infra y config crítica (con confirmación)
  - Gestionar permisos del equipo en Beautonomous
  - Firma técnica en el pipeline de aprobación

El Artesano (andres@shopilot.ai, sergio@shopilot.ai):
  - Lectura total de todos los repos y Slack
  - Proponer cambios de código via PR (El Mago los aprueba)
  - Disparar workflows de staging
  - Crear y gestionar tareas propias en Linear
  - Enviar mensajes a Slack (con confirmación previa)

## Gobernanza — NUNCA omitas estas reglas
1. Antes de cualquier escritura: muestra exactamente qué vas a hacer.
2. Para código: muestra el diff completo antes de crear el PR.
3. Para Slack: muestra la vista previa antes de publicar.
4. Si el rol no tiene permiso: explica por qué y ofrece escalar a El Mago.
5. Acciones de alto riesgo requieren confirmación de El Mago, siempre.
6. Confirma el resultado: qué cambió, dónde, cuándo.

## Repositorios del stack (11)
core-intelligence-conversation-api   (Coach — Node.js 18 TypeScript)
core-knowledge-semantic-base     (KB — Go + Vertex AI + BigQuery)
core-knowledge-data-synchronizator          (Data Sync — Airflow + GCS)
core-product-desktop-client                   (App — Electron + React)
core-platform-infrastructure                  (Infra — CDK TypeScript + Terraform GCP)
core-action-marketplace-provider
core-platform-billing
core-knowledge-enrichment
core-quality-feedback
core-quality-stack-evaluation
core-internal-team-workflow                   (este proyecto — solo configuración)

## Canales Slack autorizados
#engineering · #deploys · #general · #team
Connectors (OpenClaw native — OAuth only, no custom code)Conectores (OpenClaw nativos — solo OAuth, sin código propio)
ConnectorConector ReadLectura WriteEscritura Total
GitHubrepos, PRs, issues, workflows, logsrepos, PRs, issues, workflows, logsissues, comments, propose PR, trigger/re-run workflowsissues, comentarios, proponer PR, disparar/re-ejecutar workflows10
Lineartasks, sprints, team metricstareas, sprints, métricas del equipocreate/assign/comment tasks, change status/priority, create sprintcrear/asignar/comentar tareas, cambiar estado/prioridad, crear sprint9
Coderead file, search in codeleer archivo, buscar en códigolow-risk changes via PR, propose logic changes via PRcambios de bajo riesgo via PR, proponer cambios de lógica via PR7
Slackchannels, threads, searchcanales, hilos, búsquedamessages (with prior confirmation), approval notificationsmensajes (con confirmación previa), notificaciones de aprobación5
Acceptance CriteriaCriterios de Aceptación
  • “Beautonomous” project created in OpenClaw with system prompt configured
  • GitHub OAuth connected and all 11 repos authorized
  • Linear OAuth connected with Shopilot workspace (AUT team)
  • Slack OAuth connected with 4 authorized channels (#engineering, #deploys, #team, #general)
  • 4 roles correctly assigned by email (pablo=Capitán, mateo=Mago, andres=Artesano, sergio=Artesano)
  • All 4 team members make 3 read queries each without error
  • 5 tasks created in Linear from Beautonomous without incidents
  • 1 PR passes quality gate automatically → Mateo receives DM with summary, diff and result, approves from Slack
  • quality-gate.yml deployed and running in at least 3 repos (bootstrap complete)
  • Beautonomous detects a GitHub Actions failure and notifies in #deploys in <5 minutes
  • Daily sprint summary published in #team at 9:00 AM for 3 consecutive days
  • Proyecto “Beautonomous” creado en OpenClaw con system prompt configurado
  • GitHub OAuth conectado y los 11 repos autorizados
  • Linear OAuth conectado con el workspace Shopilot (equipo AUT)
  • Slack OAuth conectado con 4 canales autorizados (#engineering, #deploys, #team, #general)
  • 4 roles asignados correctamente por email (pablo=Capitán, mateo=Mago, andres=Artesano, sergio=Artesano)
  • Los 4 miembros hacen 3 consultas de lectura sin error
  • 5 tareas creadas en Linear desde Beautonomous sin incidentes
  • 1 PR pasa el quality gate automáticamente → Mateo recibe DM con resumen, diff y resultado, aprueba desde Slack
  • quality-gate.yml desplegado y corriendo en al menos 3 repos (bootstrap completo)
  • Beautonomous detecta un fallo de GitHub Actions y notifica en #deploys en <5 minutos
  • Resumen diario publicado en #team a las 9:00 AM durante 3 días consecutivos

Sequential Approval PipelinePipeline de Aprobación Secuencial

PR abierto
     │
     ▼
Quality Gate (automático — Claude Code)
     ├── FALLA → #deploys + DM al Artesano → vuelve al Artesano. Fin.
     │
     └── PASA → DM a Mateo en Slack
                      │
                      ├── RECHAZA → comentario en PR + DM al Artesano. Fin.
                      │
                      └── APRUEBA
                             ├── destino staging → merge automático
                             └── destino prod → DM a Pablo en Slack
                                                   ├── RECHAZA → Fin.
                                                   └── APRUEBA → merge → deploy prod

Mateo and Pablo approve from Slack: Beautonomous sends PR summary + diff + quality gate result to Slack and the approver responds in that thread. Zero context switch.Mateo y Pablo aprueban desde Slack: Beautonomous envía el resumen del PR + diff + resultado del quality gate, y el aprobador responde en ese hilo. Cero context switch.

Proactivity — Beautonomous doesn’t wait to be askedProactividad — Beautonomous no espera que le pregunten

TriggerDisparador Automatic actionAcción automática
GitHub Action fails (any repo)GitHub Action falla (cualquier repo)Message in #deploys: workflow, repo, branch, link to logMensaje en #deploys: workflow, repo, rama, link al log
GitHub Action fails on main or prodGitHub Action falla en main o prodMessage in #deploys + direct DM to El MagoMensaje en #deploys + DM directo a El Mago
PR unreviewed >4 hoursPR sin revisar >4 horasPing in #engineering with link and authorPing en #engineering con enlace y autor
Linear task blocked >2 daysTarea Linear bloqueada >2 díasAlert to El Mago with block contextAlerta a El Mago con contexto del bloqueo
9:00 AM daily9:00 AM diarioSummary in #team: pending PRs, failing CI, tasks in progress per personResumen en #team: PRs pendientes, CI fallando, tareas en progreso por persona

What it does NOT doQué NO hace

×

Not the Shopilot product interfaceNo es la interfaz del producto ShopilotBeautonomous is the team’s agent, not the seller’s. Zero relation with the seller Coach or projects #1–#16 at runtime.Beautonomous es el agente del equipo, no del vendedor. No tiene ninguna relación con el Coach de los vendedores ni con los proyectos #1–#16 en tiempo de ejecución.

×

Does not self-mergeNo hace self-mergePRs generated by the agent can only be approved by El Mago. No exceptions — never self-merge.Los PRs que genera el agente solo los puede aprobar El Mago. Sin excepción — nunca self-merge.

×

Does not manage production credentialsNo gestiona credenciales de producciónAWS/GCP secrets, external API tokens, prod env vars — out of scope. El Mago manages them directly.Secrets de AWS/GCP, tokens de APIs externas, variables de entorno de prod — fuera del scope. Los maneja El Mago directamente.

×

Does not make technical decisionsNo toma decisiones técnicasDetects convention violations in the quality gate but doesn’t decide if an architecture change is correct. Escalates to El Mago with context.Detecta violaciones de convenciones en el quality gate, pero no decide si un cambio de arquitectura es correcto. Escala a El Mago con contexto.

×

Does not auto-sync .memory between reposNo sincroniza automáticamente los .memory entre reposThe general MEMORY.md is not auto-generated from individual ones. Requires El Mago to update it when there are cross-repo relevant decisions.El MEMORY.md general no se genera automáticamente desde los individuales. Requiere que El Mago lo actualice cuando hay decisiones cross-repo relevantes.

5-Phase Implementation Plan — everything is OpenClaw config, the only code is quality-gate.ymlPlan de Implementación en 5 Fases — todo es config OpenClaw, el único código es quality-gate.yml

Phase 1 — Connect (Day 1–2)Fase 1 — Conectar (Día 1–2)

Create Beautonomous project in OpenClaw → connect GitHub OAuth → authorize 11 repos → connect Linear OAuth → paste system prompt. Agent operational for read queries. Owner: Pablo.Crear proyecto Beautonomous en OpenClaw → conectar GitHub OAuth → autorizar 11 repos → conectar Linear OAuth → pegar system prompt. Agente operacional para consultas de lectura. Owner: Pablo.

Phase 2 — Roles & Slack (Day 2–3)Fase 2 — Roles y Slack (Día 2–3)

Assign 3 roles by email in OpenClaw Team Settings → connect Slack OAuth → authorize 4 channels → configure proactivity alerts → validate: each team member makes 3 read queries. Owner: Mateo.Asignar 3 roles por email en OpenClaw Team Settings → conectar Slack OAuth → autorizar 4 canales → configurar alertas de proactividad → validar: cada miembro hace 3 consultas. Owner: Mateo.

Phase 3 — Quality Gate Bootstrap (Week 1–2)Fase 3 — Bootstrap Quality Gate (Semana 1–2)

Copy templates from core-internal-team-workflow/templates/ to each repo: CLAUDE.md + MEMORY.md + .claude/specs/ + skills symlinks + quality-gate.yml → configure branch protection rules (develop: quality gate + 1 review; main: quality gate + 2 reviews + no direct push). Owner: Mateo.Copiar templates de core-internal-team-workflow/templates/ a cada repo: CLAUDE.md + MEMORY.md + .claude/specs/ + symlinks de skills + quality-gate.yml → configurar branch protection rules (develop: quality gate + 1 review; main: quality gate + 2 reviews + no direct push). Owner: Mateo.

Phase 4 — Progressive Write Access (Week 2)Fase 4 — Escritura Progresiva (Semana 2)

Enable write categories most-reversible first: Linear tasks → GitHub issues → re-run workflows → propose code changes via PR → staging workflows. Each step validates before advancing. Owner: Mateo.Habilitar escritura por categoría, lo más reversible primero: tareas Linear → issues GitHub → re-run workflows → proponer cambios via PR → workflows staging. Cada paso valida antes de avanzar. Owner: Mateo.

Phase 5 — PR Approval Pipeline Validation (Week 2–3)Fase 5 — Validación Pipeline de Aprobación (Semana 2–3)

Validate end-to-end pipeline: PR → quality gate → Mateo DM → Pablo DM (prod only) → auto merge. Validate that the agent does not self-merge its own PRs. Validate daily summary in #team at 9:00 AM. Owner: Mateo + Pablo.Validar el pipeline end-to-end: PR → quality gate → DM Mateo → DM Pablo (solo prod) → merge automático. Validar que el agente no hace self-merge de sus propios PRs. Validar resumen diario en #team a las 9:00 AM. Owner: Mateo + Pablo.

Risk AnalysisAnálisis de Riesgos

Governance jailbreakJailbreak de gobernanza

Impact: HighImpacto: Alto

Mitigation: Double layer — rules in system prompt (LLM understands why) AND platform-level permissions in OpenClaw (LLM cannot do X regardless of prompt). Both layers required: one for reasoning quality, one for operational safety.Mitigación: Doble capa — reglas en system prompt (el LLM entiende por qué) Y permisos a nivel de plataforma OpenClaw (el LLM no puede hacer X). Ambas capas requeridas: una para calidad de razonamiento, otra para seguridad operativa.

Quality gate without context (step 0 fails)Quality gate sin contexto (falla paso 0)

Impact: MediumImpacto: Medio

Mitigation: Step 0 verifies required files before running the agent. Beautonomous notifies in #deploys with missing files and bootstrap instructions. No repo is unblocked without the complete base structure.Mitigación: El paso 0 verifica los archivos requeridos antes de correr el agente. Beautonomous notifica en #deploys con los archivos faltantes e instrucciones de bootstrap. Ningún repo queda desbloqueado sin la estructura base completa.

Broken cross-repo contracts not detectedContratos entre repos rotos sin detectar

Impact: HighImpacto: Alto

Mitigation: Contracts in CLAUDE.md + the quality gate reads them on every PR. El Mago updates contracts when an integration is designed or changed — not optional.Mitigación: Contratos en CLAUDE.md + el quality gate los lee en cada PR. El Mago actualiza los contratos cuando se diseña o cambia una integración — no es opcional.

System prompt stalenessSystem prompt obsoleto

Impact: Low–MediumImpacto: Bajo–Medio

Mitigation: Bi-weekly review owned by Pablo. As the stack evolves (new repos, tools, governance rules), the system prompt must reflect it. Version the prompt in git alongside technical specs.Mitigación: Revisión bimensual propiedad de Pablo. A medida que el stack evoluciona, el system prompt debe reflejarlo. Versionar el prompt en git junto a los specs técnicos.

Key DecisionsDecisiones Clave

D1.

OpenClaw vs. custom agent infrastructureOpenClaw vs. infraestructura de agente propiaBuilding a custom operational agent would require: Lambda, DynamoDB, GitHub App, Linear webhook, Slack bot — 3–4 weeks of engineering. OpenClaw provides all of this from Day 1. The Shopilot team builds for sellers, not for itself.Construir un agente operativo propio requeriría: Lambda, DynamoDB, GitHub App, webhook Linear, bot Slack — 3–4 semanas de ingeniería. OpenClaw provee todo esto desde el Día 1. El equipo Shopilot construye para vendedores, no para sí mismo.

D2.

Quality Gate via Claude Code API vs. static linters onlyQuality Gate via Claude Code API vs. solo linters estáticosStatic linters detect syntax and types but miss architecture boundaries and cross-repo contracts. Claude Code with repo context (CLAUDE.md + specs) detects what linters cannot. The script is written once and replicated — maintenance cost is O(1).Los linters estáticos detectan sintaxis y tipos pero no boundaries de arquitectura ni contratos entre repos. Claude Code con contexto del repo (CLAUDE.md + specs) detecta lo que los linters no pueden. El script se escribe una vez y se replica — costo de mantenimiento O(1).

D3.

Contracts in CLAUDE.md (not a separate service)Contratos en CLAUDE.md (no un servicio separado)A contract registry as a separate service creates yet another thing to maintain. CLAUDE.md already lives in every repo, is versioned with the code, and the quality gate already reads it. Contracts are plain text in an existing file — zero overhead.Un registro de contratos como servicio separado crea otra cosa más que mantener. CLAUDE.md ya vive en cada repo, se versiona con el código, y el quality gate ya lo lee. Los contratos son texto plano en un archivo existente — cero overhead.

D4.

Slack as second native channel (not just notifications)Slack como segundo canal nativo (no solo notificaciones)The team is in Slack all day. Requiring them to open OpenClaw to approve a PR creates friction. Beautonomous sends diff + quality gate result to Slack and the approver responds in the same thread — zero context switch.El equipo está en Slack todo el día. Obligarlos a abrir OpenClaw para aprobar un PR genera fricción. Beautonomous envía el diff + resultado del quality gate a Slack y el aprobador responde en el mismo hilo — cero context switch.

Current StateEstado Actual

View status from SlackVer status desde SlackPending — OpenClaw + connectorsPendiente — OpenClaw + conectores
Create tasks from SlackCrear tareas desde SlackPending — Linear OAuthPendiente — Linear OAuth
Approve PRs from SlackAprobar PRs desde SlackPending — quality gate + branch protectionPendiente — quality gate + branch protection
Activate quality agent from SlackActivar quality agent desde SlackPending — quality-gate.yml in 11 reposPendiente — quality-gate.yml en 11 repos
Proactivity (alerts + daily summary)Proactividad (alertas + resumen diario)Pending — OpenClaw configuredPendiente — OpenClaw configurado
Base structure per repo (bootstrap)Estructura base por repo (bootstrap)🔨 Partial — only core-intelligence-conversation-api (incomplete: no specs/, skills/, full .claudeignore)Parcial — solo core-intelligence-conversation-api (incompleto: sin specs/, skills/, .claudeignore completo)
Source:Fuente: OpenClaw platform | Depends on:Depende de: OpenClaw account · GitHub org beautonomous (11 repos) · Linear workspace AUT · Slack workspace beautonomousCuenta OpenClaw · org GitHub beautonomous (11 repos) · workspace Linear AUT · workspace Slack beautonomous | Depended on by:Del que dependen: Entire team workflow — unblocks faster development of projects #12–19Todo el workflow del equipo — desbloquea el desarrollo más rápido de los proyectos #12–19
📋 Project ChangelogChangelog del Proyecto
v5 Mar 3, 2026
+Complete rewrite using source docs 54 (Beautonomous Internals) and 55 (Beautonomous Layer)Reescritura completa usando documentos fuente 54 (Beautonomous Internals) y 55 (Beautonomous Layer)
+Architecture diagram: OpenClaw UI + Slack + Terminal → OpenClaw motor → quality base structure per repoDiagrama de arquitectura: OpenClaw UI + Slack + Terminal → motor OpenClaw → estructura base de calidad por repo
+4 capabilities specified (status from Slack, tasks from Slack, PR approval pipeline, quality agent)4 capacidades especificadas (status desde Slack, tareas desde Slack, pipeline de aprobación de PRs, quality agent)
+Quality gate: 5 sequential steps with step 0 (base structure check), tools per step, Slack routing for pass/failQuality gate: 5 pasos secuenciales con paso 0 (verificación de estructura base), herramientas por paso, routing Slack
+Bootstrap structure: complete .claude/ per repo (settings.json with PostToolUse hook, memory/, specs/, skills/)Estructura bootstrap: .claude/ completo por repo (settings.json con hook PostToolUse, memory/, specs/, skills/)
+Cross-repo contracts: standard format in CLAUDE.md + what the quality gate verifies for Expone/ConsumeContratos entre repos: formato estándar en CLAUDE.md + qué verifica el quality gate sobre Expone/Consume
+Sequential approval pipeline diagram (quality gate → Mateo DM → Pablo DM for prod → auto merge)Diagrama del pipeline de aprobación secuencial (quality gate → DM Mateo → DM Pablo para prod → merge automático)
+Proactivity triggers table (5 triggers: CI fails, main CI fails, PR >4h unreviewed, task blocked >2d, daily 9AM)Tabla de disparadores de proactividad (5 disparadores: CI falla, CI falla en main, PR >4h sin revisar, tarea bloqueada >2d, 9AM diario)
+What it does NOT do section (5 explicit limits: no seller interface, no self-merge, no prod secrets, no arch decisions, no auto memory sync)Sección qué NO hace (5 límites explícitos: no interfaz vendedor, no self-merge, no secrets prod, no decisiones arquitectura, no auto sync memory)
+System prompt base: full text ready to paste in OpenClaw (roles, governance rules, 11 repos, Slack channels)System prompt base: texto completo listo para pegar en OpenClaw (roles, reglas de gobernanza, 11 repos, canales Slack)
+Connectors table: GitHub (10) + Linear (9) + Code (7) + Slack (5) with read/write breakdownTabla de conectores: GitHub (10) + Linear (9) + Code (7) + Slack (5) con desglose lectura/escritura
+Current state table updated from source doc 55 (6 capabilities, partial bootstrap noted)Tabla de estado actual actualizada desde documento fuente 55 (6 capacidades, bootstrap parcial anotado)
+Permission matrix corrected: El Capitán cannot trigger production workflows (was wrong in previous version)Matriz de permisos corregida: El Capitán no puede disparar workflows de producción (estaba incorrecto en versión anterior)
Removed: OpenClaw KB component (replaced by quality base structure per repo with CLAUDE.md + MEMORY.md + quality-gate.yml)Eliminado: componente OpenClaw KB (reemplazado por estructura base de calidad por repo con CLAUDE.md + MEMORY.md + quality-gate.yml)
Removed: reference to “Section 7” implementation guide (all content now contained in this card)Eliminado: referencia a guía de implementación “Sección 7” (todo el contenido ahora contenido en esta tarjeta)
Removed: generic permission matrix (replaced with detailed, accurate matrix from source doc 54)Eliminado: matriz de permisos genérica (reemplazada con matriz detallada y correcta desde documento fuente 54)
v4 Mar 2, 2026
+Full implementation guide written as Section 7 (7.1–7.13): architecture, governance model, 31-tool catalog, system prompt base, 5-phase plan, acceptance criteria, risk analysis — 1,558 linesGuía completa de implementación escrita como Sección 7 (7.1–7.13): arquitectura, gobernanza, catálogo de 31 herramientas, system prompt base, plan 5 fases, criterios de aceptación, análisis de riesgos — 1.558 líneas

9. MVP — 10+2 Week Execution Plan MVP — Plan de Ejecucion 10+2 Semanas

9.1 Non-Technical Overview Resumen No Técnico

For investors, advisors, and non-technical stakeholders Para inversores, advisors y stakeholders no técnicos

What is Shopilot? Que es Shopilot?

Shopilot is an AI assistant that lives inside your online store. Think of it as a smart co-worker who knows your products, watches your competitors, and helps you make better decisions — all from a single app where you also browse your marketplace normally. Shopilot es un asistente de IA que vive dentro de tu tienda online. Piensa en un companero de trabajo inteligente que conoce tus productos, vigila a tus competidores y te ayuda a tomar mejores decisiones — todo desde una sola app donde tambien navegas tu marketplace normalmente.

It works with MercadoLibre, Amazon, and Shopify — the three biggest e-commerce platforms in Latin America. One assistant for all your stores. Funciona con MercadoLibre, Amazon y Shopify — las tres plataformas de e-commerce mas grandes de Latinoamerica. Un solo asistente para todas tus tiendas.

The Problem El Problema

Online sellers spend 3-5 hours every day on repetitive tasks that don't directly grow their business: Los vendedores online gastan 3-5 horas cada dia en tareas repetitivas que no hacen crecer directamente su negocio:

• Checking competitor prices across dozens of products• Revisar precios de la competencia en docenas de productos

• Manually optimizing listing titles and descriptions• Optimizar manualmente titulos y descripciones de publicaciones

• Monitoring inventory levels across marketplaces• Monitorear niveles de inventario entre marketplaces

• Analyzing sales data to spot trends and problems• Analizar datos de ventas para detectar tendencias y problemas

• Switching between marketplace dashboards and spreadsheets• Alternar entre dashboards de marketplaces y hojas de calculo

• Reacting to competitor changes hours or days too late• Reaccionar a cambios de la competencia horas o dias tarde

Result: sellers are reactive instead of strategic. They spend their time on operations, not growth. Resultado: los vendedores son reactivos en vez de estrategicos. Gastan su tiempo en operaciones, no en crecimiento.

Our Solution Nuestra Solucion

An AI copilot that understands your store, answers your questions, takes actions with your permission, and proactively alerts you when something needs attention. It's not a dashboard — it's a conversation. Un copiloto de IA que entiende tu tienda, responde tus preguntas, toma acciones con tu permiso, y te alerta proactivamente cuando algo necesita atencion. No es un dashboard — es una conversacion.

You say: "How are my sales this week?"Tu dices: "Como van mis ventas esta semana?"

Shopilot: "Sales are up 12% vs last week. Your top product is X with 45 units. However, product Y dropped 30% — your competitor lowered their price by 15%. Want me to adjust your price?"Shopilot: "Las ventas subieron 12% vs la semana pasada. Tu producto top es X con 45 unidades. Sin embargo, el producto Y cayo 30% — tu competidor bajo su precio 15%. Quieres que ajuste tu precio?"

You say: "Yes, match their price minus 5%"Tu dices: "Si, iguala su precio menos 5%"

Shopilot: "Done. Price updated from $89 to $76. Next time you ask about this product, I'll show you how the competition reacted."Shopilot: "Listo. Precio actualizado de $89 a $76. La próxima vez que preguntes por este producto, te mostraré cómo reaccionó la competencia."

How It Works (User Journey) Como Funciona (Recorrido del Usuario)

1. Download & Install1. Descarga e Instala

Download Shopilot.app for Mac. Install in seconds — no technical setup required.Descarga Shopilot.app para Mac. Se instala en segundos — no requiere setup técnico.

2. Connect Your Store2. Conecta Tu Tienda

Link your MercadoLibre, Amazon, or Shopify account with one click. Shopilot syncs your products, sales, and metrics automatically.Vincula tu cuenta de MercadoLibre, Amazon o Shopify con un click. Shopilot sincroniza tus productos, ventas y metricas automaticamente.

3. Browse & Chat3. Navega y Chatea

Browse your marketplace normally. Shopilot's sidebar is always available — ask anything about your store.Navega tu marketplace normalmente. La barra lateral de Shopilot siempre esta disponible — pregunta lo que quieras sobre tu tienda.

4. Act With Permission4. Actua Con Permiso

Shopilot can update titles, adjust prices, and manage listings — but always asks for your confirmation first. You stay in control.Shopilot puede actualizar titulos, ajustar precios y gestionar publicaciones — pero siempre pide tu confirmacion primero. Tu mantienes el control.

5. Smart Suggestions5. Sugerencias Inteligentes

While you chat, Shopilot detects opportunities: "Your competitor dropped prices on 3 products — want me to adjust yours?" Act on them instantly.Mientras conversas, Shopilot detecta oportunidades: "Tu competidor bajó precios en 3 productos — ¿quieres que ajuste los tuyos?" Actúa sobre ellas al instante.

What Makes Us Different Que Nos Hace Diferentes

Native App, Not ExtensionApp Nativa, No Extension

A real desktop application — no browser extensions that slow down your store, break with updates, or leak your data.Una aplicacion de escritorio real — sin extensiones de navegador que ralenticen tu tienda, se rompan con actualizaciones o filtren tus datos.

AI That Reasons, Not RulesIA Que Razona, No Reglas

Powered by Claude — understands context, nuance, and your business. Not a rigid rule-based system that gives the same advice to everyone.Impulsado por Claude — entiende contexto, matices y tu negocio. No es un sistema rigido de reglas que da el mismo consejo a todos.

3 Marketplaces, 1 Tool3 Marketplaces, 1 Herramienta

MercadoLibre + Amazon + Shopify in one assistant. Most tools only cover one platform. We cover where LatAm sellers actually sell.MercadoLibre + Amazon + Shopify en un solo asistente. La mayoria de herramientas solo cubren una plataforma. Nosotros cubrimos donde los vendedores LatAm realmente venden.

Business Model Modelo de Negocio

Free — $0/mo

50 actions/month, read-only. Try before you buy.50 acciones/mes, solo lectura. Prueba antes de comprar.

Pro — $49/mo

500 actions/month, read + write + proactive alerts. The real product.500 acciones/mes, lectura + escritura + alertas proactivas. El producto real.

Credit PacksPaquetes de Creditos

Need more? Buy packs: $5/100, $20/500, $35/1000 credits. Pro users only.Necesitas mas? Compra paquetes: $5/100, $20/500, $35/1000 creditos. Solo usuarios Pro.

Unit economics: Our AI cost per user is ~$4/month. At $49/month Pro pricing, that's a 91% gross margin. The business works from user #12. Unit economics: Nuestro costo de IA por usuario es ~$4/mes. A $49/mes precio Pro, eso es un 91% de margen bruto. El negocio funciona desde el usuario #12.

The 10+2 Week Plan El Plan de 10+2 Semanas

4 engineers building in parallel for 12 weeks (10 core + 2 buffer). Each engineer owns a vertical: one builds the AI brain, one builds the data pipes, one builds the app, and the CEO (also a product engineer) owns product quality and launch. Every 2 weeks there's a clear deliverable. By week 10, real sellers are using the product. Weeks 11-12 absorb beta fixes, deferred scope, and hardening. 4 ingenieros construyendo en paralelo por 12 semanas (10 core + 2 buffer). Cada ingeniero es dueno de una vertical: uno construye el cerebro de IA, otro los pipes de datos, otro la app, y el CEO (tambien product engineer) es dueno de la calidad del producto y el lanzamiento. Cada 2 semanas hay un entregable claro. Para la semana 10, vendedores reales estan usando el producto. Semanas 11-12 absorben bugs de beta, scope diferido y hardening.

Mateo

CTO

AI + OrchestrationIA + Orquestacion

Andres

Data + BE

APIs + DataAPIs + Datos

Sergio

Full-Stack

App + UIApp + UI

Pablo

CEO / PE

Product + QAProducto + QA

Success Metrics Metricas de Exito

1+

Action in First SessionAccion en Primera Sesion

Activation — user gets value immediatelyActivacion — usuario obtiene valor de inmediato

48h

Return Within 48 HoursRetorno en 48 Horas

Retention — product is worth coming back toRetencion — el producto vale la pena volver

60%

Time Saved vs ManualTiempo Ahorrado vs Manual

Value — Shopilot is measurably fasterValor — Shopilot es mediblemente mas rapido

9.2 Execution Philosophy — SV/YC Methodology Filosofia de Ejecucion — Metodologia SV/YC

4 engineers × AI — ship in 10+2 weeks. (aspiration: leverage AI to operate above headcount) 4 ingenieros × IA — entregar en 10+2 semanas. (aspiración: usar IA para operar por encima del headcount)

This plan fuses YC Build Sprint (12-week cycles, weekly accountability, launch early) with Shape Up (L/M/S task classification, appetite-based scoping, circuit breakers) and amplifies it with Beautonomous (#17 CORE) — the AI operational agent that eliminates coordination overhead. Este plan fusiona el Build Sprint de YC (ciclos de 12 semanas, accountability semanal, lanzar temprano) con Shape Up (clasificacion L/M/S de tareas, scoping por apetito, circuit breakers) y lo amplifica con Beautonomous (#17 CORE) — el agente operacional IA que elimina el overhead de coordinacion.

A — Three Founding Pillars A — Tres Pilares Fundacionales

Do Things That Don't Scale

Onboard every beta user personally. Write every KB doc manually. Review every PR. Automate later — earn trust first.Hacer onboarding personal a cada usuario beta. Escribir cada doc KB manualmente. Revisar cada PR. Automatizar despues — ganar confianza primero.

Default Alive

Every spending decision: does this help us reach revenue before runway ends? Free tier is acquisition, Pro tier is survival. Frugal by design.Cada decision de gasto: ¿ayuda a llegar a revenue antes de que termine el runway? Tier Free es adquisicion, tier Pro es supervivencia. Frugal por diseño.

OMTM: Tools Executed / Week [CORREGIDO]

One Metric That Matters. Not signups, not MRR — tools executed per week per active user. That's the proof the copilot is delivering value.La Unica Metrica que Importa. No signups, no MRR — tools ejecutadas por semana por usuario activo. Esa es la prueba de que el copilot entrega valor.

YC PrinciplesPrincipios YC

Weekly goals within 2-week cyclesObjetivos semanales dentro de ciclos de 2 semanas

Launch early, launch oftenLanzar temprano, lanzar seguido

Risk-first: address uncertainty earlyRiesgo primero: abordar incertidumbre temprano

Maker's schedule: 4h uninterrupted blocksHorario maker: bloques de 4h sin interrupciones

Shape Up PatternsPatrones Shape Up

Appetite (not estimate): 2 weeks per scopeApetito (no estimado): 2 semanas por scope

Circuit breaker: not done at deadline = cutCircuit breaker: no listo al deadline = cortar

Hill chart: "figuring out" → "making it happen"Hill chart: "descubriendo" → "haciendolo"

Scopes (not tasks): group by user outcomeScopes (no tareas): agrupar por resultado usuario

CeremoniesCeremonias

Async standup daily 9:30 AM (Linear+Slack)Standup asincrono diario 9:30 AM (Linear+Slack)

Cycle planning biweekly (60 min sync)Planeacion de ciclo bisemanal (60 min sync)

Friday demo (30 min, each engineer demos)Demo viernes (30 min, cada ingeniero demuestra)

Retro biweekly (30 min, Lean Coffee)Retro bisemanal (30 min, Lean Coffee)

B — Sprint Contract: Success Criteria per Sprint B — Contrato de Sprint: Criterios de Exito por Sprint

Sprint Label Success Criteria Gate
S0 Pre-Sprint CORE operational. All 4 engineers aligned. Beautonomous managing Linear + GitHub + Slack. Zero ambiguity before W1.CORE operacional. Los 4 ingenieros alineados. Beautonomous gestionando Linear + GitHub + Slack. Cero ambiguedad antes de S1. T0.8 ✓
S1–2 Foundation Walking skeleton E2E: Electron loads marketplace → sidebar sends message → ReAct loop processes → response returns. Ugly OK — architecture proven.Walking skeleton E2E: Electron carga marketplace → sidebar envia mensaje → loop ReAct procesa → respuesta retorna. Feo OK — arquitectura probada. W2
S3–4 Core Engines 10 READ tools registered as stubs (mock data, T2.5). IContextAssembler + Health summary working. Eval runner executes 15+ golden cases. Tool Registry + HookLifecycle deployed.10 tools READ registradas como stubs (datos mock, T2.5). IContextAssembler + Health summary funcionando. Eval runner ejecuta 15+ golden cases. Tool Registry + HookLifecycle desplegados. Gate 1
S5–6 WRITE Tools First WRITE tools (update_product_content, update_price, pause_product, activate_product) execute on all 3 marketplaces. Confirmation flow works. Billing Free tier live. Enrichment returns competitor data. Eval CI integration blocks PRs on regression.Primeros tools WRITE (update_product_content, update_price, pause_product, activate_product) ejecutan en los 3 marketplaces. Flujo de confirmacion funciona. Billing Free tier vivo. Enrichment retorna datos de competidores. Eval CI integration bloquea PRs en regresión. W6
S7–8 Hardening 4+ WRITE tools operational (more per circuit breaker capacity). WebSocket streaming live. Proactive suggestions via afterTool LLM hook (max 2/turn). Load test: 50 concurrent users passes. Eval score ≥0.70. Staging deployed.4+ tools WRITE operacionales (más las que quepan según circuit breaker). WebSocket streaming vivo. Sugerencias proactivas via hook LLM afterTool (max 2/turno). Load test: 50 usuarios concurrentes pasa. Eval score ≥0.70. Staging desplegado. Gate 2
S9–10 Launch Beta: 10+ real sellers onboarded. 0 P0/P1 bugs. .dmg signed + notarized. Production deployed. OMTM: ≥1 tool/user/week. Eval score ≥0.70.Beta: 10+ vendedores reales onboardeados. 0 bugs P0/P1. .dmg firmado + notarizado. Produccion desplegada. OMTM: ≥1 tool/usuario/semana. Eval score ≥0.70. Gate 3
S11–12 Buffer Beta bug fixes (P1/P2). Performance hardening (p95, RAM). Deferred scope from circuit breaker (remaining WRITE tools, ProactiveSuggestions v2). Eval score target 0.80. System prompt v3 with real beta data.Bug fixes de beta (P1/P2). Hardening de performance (p95, RAM). Scope diferido por circuit breaker (WRITE tools restantes, ProactiveSuggestions v2). Eval score target 0.80. System prompt v3 con datos reales de beta.

C — Decision Gates (Go / No-Go) C — Gates de Decision (Go / No-Go)

Gate 1 — "It Talks" (W4)Gate 1 — "Habla" (S4)

Owner: Pablo (CEO). Held: Friday W4 demo.Owner: Pablo (CEO). Fecha: Demo viernes S4.

Coach responds coherently in Spanish to seller questionsCoach responde coherentemente en español a preguntas de vendedor

ReAct loop calls ≥1 tool per relevant queryLoop ReAct llama ≥1 tool por query relevante

KB docs indexed — context injection workingDocs KB indexados — context injection funcionando

Electron app loads MeLi URL without crashesApp Electron carga URL MeLi sin crashes

Unit test coverage ≥70%Cobertura tests unitarios ≥70%

Beautonomous used for all task management (Linear + GitHub via CORE)Beautonomous usado para todo el manejo de tareas (Linear + GitHub via CORE)

No-Go: loop doesn't use tools OR response incoherentNo-Go: loop no usa tools O respuesta incoherente

Gate 2 — "It Acts" (W8)Gate 2 — "Actúa" (S8)

Owner: Pablo (CEO). Held: Friday W8 demo.Owner: Pablo (CEO). Fecha: Demo viernes S8.

WRITE tools execute real changes on MeLi + Amazon + ShopifyTools WRITE ejecutan cambios reales en MeLi + Amazon + Shopify

Confirmation flow: diff shown, Accept/Reject worksFlujo confirmacion: diff mostrado, Accept/Reject funciona

Billing: Free tier limits enforced, Pro upgrade worksBilling: limites Free tier aplicados, upgrade Pro funciona

Load test 50 concurrent users passesLoad test 50 usuarios concurrentes pasa

WebSocket streaming live (T4.1)WebSocket streaming vivo (T4.1)

Proactive suggestions active via afterTool hookSugerencias proactivas activas via hook afterTool

Eval score ≥0.70Eval score ≥0.70

CI/CD pipeline auto-deploys to staging on mergePipeline CI/CD auto-deploy a staging en merge

E2E tests ≥30 passingE2E tests ≥30 pasando

No-Go: WRITE tool fails OR confirmation flow brokenNo-Go: tool WRITE falla O flujo confirmacion roto

Gate 3 — "It Ships" (W10)Gate 3 — "Entrega" (S10)

Owner: All 4 engineers. Held: Final Go/No-Go sync.Owner: Los 4 ingenieros. Fecha: Sync final Go/No-Go.

Beta cohort: ≥10 real sellers onboardedCohort beta: ≥10 vendedores reales onboardeados

0 high-severity bugs (P0/P1)0 bugs alta severidad (P0/P1)

.dmg signed + notarized, installs without Gatekeeper warning.dmg firmado + notarizado, instala sin warning de Gatekeeper

OMTM baseline: ≥1 tool executed per active user per weekBaseline OMTM: ≥1 tool ejecutada por usuario activo por semana

Billing Stripe live (production)Billing Stripe en vivo (producción)

Eval score ≥0.70Eval score ≥0.70

API p95 <3sAPI p95 <3s

Guardrails active (ToolPolicyFilter enforced)Guardrails activos (ToolPolicyFilter aplicado)

OWASP review approvedRevisión OWASP aprobada

No-Go: P0 bug open OR <5 users onboardedNo-Go: bug P0 abierto O <5 usuarios onboardeados

D — Linear Structure (Exportable)D — Estructura Linear (Exportable)

TeamEquipo

Shopilot (AUT)

CyclesCiclos

6 × 2-week cycles (incl. buffer)6 × ciclos de 2 semanas (incl. buffer)

ProjectsProyectos

19 active (1 per project)19 activos (1 por proyecto)

Labels

L/M/S (size) • Track-Mateo/Andres/Sergio/Pablo • Risk-high/medium/low • Spike

Workflow: Backlog → Todo → In Progress → In Review → Done. Relations: blocks / is-blocked-by for dependencies. Workflow: Backlog → Todo → In Progress → In Review → Done. Relaciones: bloquea / bloqueado-por para dependencias.

E — Task Decomposition PatternE — Patron de Descomposicion de Tareas

Epic

5-10 daysdias

Story

1-3 daysdias

Task

2-8 hourshoras

Sub-task

1-4 hourshoras

Rule: if a Task takes >8h, break it down. Single-threaded ownership: 1 owner per task, no committees.Regla: si un Task toma >8h, desglosarlo. Propiedad single-threaded: 1 dueno por tarea, sin comites.

6 Phases — ~150 Tasks — 17 Projects — 4 Engineers6 Fases — ~150 Tareas — 17 Proyectos — 4 Ingenieros

Phase 0

Pre-Sprint

8 tasks • W0 • #17 CORE

Phase 1

Foundation

62 tasks • S1-4

Phase 2

Full Features

60 tasks • S5-8

Phase 3

Polish & Launch

17 tasks • S9-10

9.3 Pre-Sprint 0: Technical Alignment Session Pre-Sprint 0: Sesion de Alineacion Técnica

Project ParametersParámetros del Proyecto

12

weeks (10+2)semanas (10+2)

4

engineersingenieros

183

taskstareas

383

story points

OMTM: Tools Executed / Week / Active User — proof the copilot delivers real value.OMTM: Tools Ejecutadas / Semana / Usuario Activo — prueba de que el copilot entrega valor real.

Methodology — Shape Up + Scrum + Kanban (Hybrid L/M/S)Metodología — Shape Up + Scrum + Kanban (Híbrido L/M/S)

SizeTamañoTimeTiempoModelModeloCeremonyCeremoniaExampleEjemplo
L>3 diasShape Up betDiscovery + 2-week appetite. Circuit breaker if unfinished.Descubrimiento + apetito 2 sem. Circuit breaker si no termina.AgentLoopOrchestrator, MeLiAdapter, Electron Shell
M1–3 diasScrum storySprint planning + clear ACs + PR review.Sprint planning + ACs claros + PR review.IContextWindowManager, TokenRefreshCron, BillingView
S<1 diaKanban cardPull from backlog, execute, merge. WIP limit: 2 per engineer.Pull del backlog, ejecutar, merge. Límite WIP: 2 por ingeniero.GSI projection fix, ESLint config, OAuth Slack connect

Assignment rule: L = bet at cycle start (circuit breaker if not done). M = sprint-planned + estimated. S = pull Kanban, no standup. Distribution target: ~1% L + ~81% M + ~18% S.Regla de asignación: L = apuesta inicio de ciclo (circuit breaker si no termina). M = planificada en sprint + estimada. S = pull Kanban, sin standup. Distribución objetivo: ~1% L + ~81% M + ~18% S.

1. Do Things That Don't Scale1. Haz Cosas Que No Escalan

Personal onboarding, manual KB docs, review every PR via BeautonomousOnboarding personal, docs KB manuales, review de cada PR via Beautonomous

2. Default Alive

Every expense justified against runway. Free = acquisition, Pro = survivalCada gasto justificado contra runway. Free = adquisición, Pro = supervivencia

3. OMTM Focus

One metric: tools/week/user. Proves real value being delivered every sprintUna métrica: tools/semana/usuario. Prueba valor real entregado cada sprint

Appetite: 2 weeks/scope (not estimation)2 semanas/scope (no estimación)
Circuit breaker: Not done = CUT, not extendedNo listo = CORTAR, no extender
Hill chart: Discovering → DoingDescubriendo → Haciéndolo
Scopes: By user outcome, not tasksPor resultado usuario, no tareas

CeremoniesCeremonias

CeremonyCeremoniaFreqFrec.Dur.
Async standup (Linear+Slack)Standup asíncrono (Linear+Slack)Daily 9:30 AMDiario 9:30 AMAsync
Cycle planningPlaneación cicloBi-weeklyBisemanal60 min
Friday demoDemo viernesWeeklySemanal30 min
Retro (Lean Coffee)Bi-weeklyBisemanal30 min

Critical Tech Debt — Before Sprint 1 (T1.0)Deuda Técnica Crítica — Antes de Sprint 1 (T1.0)

  • SK Message/Trace not time-sortable: UUID v4 → ULIDSK Message/Trace no time-sortable: UUID v4 → ULID
  • findByMessageId O(n) scan → SKO(n) scan → SK Trace#{messageId}
  • GSI2 defined but never used → repurpose as sparse indexGSI2 definido nunca usado → sparse index
  • queryEmbedding (6KB) in Trace → eliminate(6KB) en Trace → eliminar
  • ProjectionType.ALL on GSIs → change to INCLUDEen GSIs → cambiar a INCLUDE

Sprint ContractContrato de Sprint

SprintLabelSuccess CriterionCriterio de ÉxitoGate
S0Pre-SprintCORE operational. Zero ambiguity. All 11 repos created.CORE operacional. Cero ambigüedad. 11 repos creados.T0.8
S1-2FoundationWalking skeleton E2E: Electron → sidebar → ReAct → real MeLi data → response.Walking skeleton E2E: Electron → sidebar → ReAct → datos MeLi reales → respuesta.
S3-4Core Engines10 READ tools in 3 marketplaces. Context injection. Playground usable.10 tools READ en 3 marketplaces. Context injection. Playground usable.Gate 1
S5-6WRITE Tools4 WRITE tools execute in 3 marketplaces. ConfirmationFlow. Billing Free tier active.4 tools WRITE ejecutan en 3 marketplaces. ConfirmationFlow. Billing Free tier activo.
S7-8HardeningProactive suggestions live. Onboarding wizard E2E. Load test 50 users passes. Staging deployed.Sugerencias proactivas activas. Onboarding wizard E2E. Load test 50 usuarios pasa. Staging desplegado.Gate 2
S9-10LaunchBeta 10+ sellers. 0 P0 bugs. Signed .dmg. Production deployed. OMTM ≥1 tool/user/week.Beta 10+ vendedores. 0 bugs P0. .dmg firmado. Producción. OMTM ≥1 tool/usuario/semana.Gate 3
S11-12BufferBeta bug fixes (P1/P2). Performance hardening. Deferred scope from circuit breaker. Eval score target 0.80.Bug fixes de beta (P1/P2). Hardening de performance. Scope diferido por circuit breaker. Eval score target 0.80.

Definition of Done — Per SprintDefinition of Done — Por Sprint

AspectAspectoS4S7S10
Unit testsTests unitarios≥70%≥80%≥80%
E2E tests≥10≥30≥50
API p95<5s<3s<3s
RAM Electron<600MB<500MB<500MB
First token (streaming)Primer token (streaming)<1s<1s
Error rateTasa error<5%<1%<1%
OAuth refresh100%100%100%

Per task DoD: code reviewed via Beautonomous (El Mago) • unit tests for new logic • no blocking linter warnings • PR merged to main • task marked Done in Linear.DoD por tarea: código revisado via Beautonomous (El Mago) • tests unitarios para lógica nueva • sin warnings bloqueantes • PR mergeado a main • tarea marcada Done en Linear.

AssumptionsSupuestos

4 engineers full-time for 12 weeks (10 core + 2 buffer S11-12)4 ingenieros a tiempo completo 12 semanas (10 core + 2 buffer S11-12)

MeLi, Amazon SP-API, Shopify Admin API — test accounts readyMeLi, Amazon SP-API, Shopify Admin API — cuentas prueba listas

Anthropic account: Claude Sonnet 4 + prompt cachingCuenta Anthropic: Claude Sonnet 4 + prompt caching

Apple Developer Program active (code signing + notarization)Apple Developer Program activo (code signing + notarización)

Stripe configured (test + live modes)Stripe configurado (modos test + live)

AWS + GCP provisioned with IAM/GCP rolesAWS + GCP aprovisionados con roles IAM/GCP

Beautonomous (#17) operational before Sprint 1 — absolute prerequisiteBeautonomous (#17) operacional antes de Sprint 1 — prerequisito absoluto

Real Sellerfy MeLi data available for testingDatos reales de Sellerfy (MeLi) disponibles para testing

⚠ Capacity Analysis — The Plan Is Aggressive⚠ Análisis de Capacidad — El Plan es Agresivo

237.5

days-engineer est.días-ingeniero est.

240

days-engineer avail.días-ingeniero dispon.

0.99x

ratio (with buffer)ratio (con buffer)

43d

S11-12 marginmargen S11-12

L tasks (~1%): 2 × 4.5d = 9 days • M tasks (~80%): 124 × 1.65d = 204.5 days • S tasks (~19%): 31 × 0.8d = 24 days = 237.5 days-engineer. With 4 engineers × 12 weeks = 240 available.Tareas L (~1%): 2 × 4.5d = 9 días • M (~80%): 124 × 1.65d = 204.5 días • S (~19%): 31 × 0.8d = 24 días = 237.5 días-ingeniero. Con 4 ingenieros × 12 sem = 240 disponibles.

The 0.99x ratio means the plan is near capacity — buffer is essential. Without buffer (S1-S10 only, 200 days), ratio is 1.19x — aggressive but feasible with buffer. S11-12 provide 2.5d slack + 40d buffer = 43d for circuit breaker overflow and beta fixes.El ratio 0.99x significa que el plan está cerca de capacidad — el buffer es esencial. Sin buffer (S1-S10, 200 días), el ratio es 1.19x — agresivo pero viable con buffer. S11-12 aportan 2.5d slack + 40d buffer = 43d para overflow del circuit breaker y bugs de beta.

9.4 Sprint-by-Sprint Visual Timeline Timeline Visual Sprint por Sprint

6 two-week sprints (10 core + 2 buffer). 4 parallel tracks. 3 integration gates. Each cell shows the primary deliverable. 6 sprints de dos semanas (10 core + 2 buffer). 4 tracks paralelos. 3 gates de integración. Cada celda muestra el entregable principal.

Sprint 1-2Sprint 1-2

FoundationFundacion

Sprint 3-4Sprint 3-4

Core EnginesMotores Core

Sprint 5-6Sprint 5-6

WRITE Tools + AuthTools WRITE + Auth

Sprint 7-8Sprint 7-8

Proactive + PolishProactivo + Polish

Sprint 9-10Sprint 9-10

Beta + ShipBeta + Ship

Sprint 11-12Sprint 11-12

BufferBuffer

Mateo

ReAct LoopLoop ReAct

#2 + multi-turn history + REST API + DynamoDB fix (ULID, GSI) + UserProfile + SystemPromptComposer L1+L2

Tools + Context + CachingTools + Contexto + Caching

Tool Registry + IContextAssembler + prompt caching + WRITE stubs + update_user_profile + contextSummary

WRITE Tools + EnrichmentWRITE Tools + Enrichment

#3 WRITE tools + #7 Guardrails + #11 Enrichment + HttpCreditGate

Proactive + Streaming + FeedbackProactivo + Streaming + Feedback

#6 ProactiveSugg + WS streaming + FeedbackCapture + ActionLog + OutputGuard + SystemPromptComposer L3

Bug Fix + QAFix Bugs + QA

Monitoring + Observability

Hardening + WRITE deferHardening + WRITE defer

P1/P2 + advertising tools + p95 + ProactiveSugg v2

Andres

Adapters + OAuth + InfraAdaptadores + OAuth + Infra

#12 MeLi + Amazon scaffold + OAuth2 + SellerConnection + MarketplaceAction + Terraform GCP verify + WRITE API docs + user mgmt research

Shopify + Data + CIShopify + Data + CI

#12 Shopify + AmazonAds OAuth + ISKUResolver + TokenRefreshCron + #10 Clean Arch + DAGs verify + #14 CDK base + CI multi-repo

Fast Data + Rate Limit + CIFast Data + Rate Limit + CI

#10 Fast Data 11 endpoints + GCS snapshots + DAG Amazon + #12 IRateLimiter + onboarding trigger + CI/CD 11 repos

Staging + Load Test + WebSocketStaging + Load Test + WebSocket

#14 Staging deploy + load test + CloudWatch + WebSocket CDK + #10 Silver/Gold

Prod Deploy + Data PipelineDeploy Prod + Data Pipeline

#14 CDK + Terraform prod + rollback testing + #10 OpenMetadata + embeddings DAGs

Prod HardeningHardening Prod

CloudWatch + adapter fixes + Silver→Gold DAG

Sergio

Electron Shell + MK1Shell Electron + MK1

#1 + WebContentsView + Tabs (con tokens T0.BB) + Mockup shell container

Chat UI + MockupsChat UI + Mockups

Chat UI (T1.BB) + WebSocket + OnboardingWizard (T1.BB) + MK1 ChatView + MK2 Onboarding

Billing + Views + MockupsBilling + Vistas + Mockups

#13 Stripe + Confirmations (T2.BB) + Cards + ProfileView + MK1 Billing + MK2 Profile + MK3 ConfirmDialog

Enrollment + Feedback + MockupsEnrollment + Feedback + Mockups

#1 WS client (T3.BB) + EnrollmentView + #15 FeedbackLoop + MK1 Enrollment + MK2 flujo WRITE

Ship .dmg + MK DashboardShip .dmg + MK Dashboard

Code signing + Security + Bug fixes (T4.BB) + MK1 Dashboard view

Beta Fixes + WindowsFixes Beta + Windows

P1/P2 UI + auto-updater S3 + FeedbackThrottle + Windows build

UX/UI

Foundations + AtomsFoundations + Atoms

T0.BB Brand book + Foundations + Icons + T1.BB Atoms + AI-native + Molecules + Chat organisms

Molecules + OrganismsMolecules + Organismos

T2.BB Molecules restantes + ConfirmDialog + ToolAccordion + MarketplaceKPI + CreditEconomy + EnrollmentCard

Advanced OrganismsOrganismos Avanzados

T3.BB ReActStream + DataTable + AuditLog + RollbackPanel + FraudAlert + ErrorRecovery. Publish [LIB] Pattern Components

Quality AuditAuditoría Calidad

T4.BB All frames “Ready for development”, zero generic names, variables verified, annotations

Pipeline ClosedPipeline Cerrado

Point queries onlySolo consultas puntuales

Pablo

KB + Beautonomous + UX/UIKB + Beautonomous + UX/UI

#17 bootstrap + Eval Setup + brand reg + Apple/Win auth + #18 approves T0.BB + T1.BB

Eval + Quality + UX/UIEval + Quality + UX/UI

#16 LLM Judge + EvalRunner + E2E testing + #17 Linear + Quality gate + #18 approves T2.BB

QA + Eval + UX/UIQA + Eval + UX/UI

#16 LLM-as-Judge + Real data QA + #18 approves T3.BB

Eval + Beta + UX/UIEval + Beta + UX/UI

#16 Eval CI + testing proactivas + beta selection + contract testing + #18 approves T4.BB

Launch + E2E EvalLanzamiento + E2E Eval

Beta + Feedback + Security + #16 E2E eval pipeline + #17 Beautonomous prompt v2 + Go/No-Go

Eval 0.80 + KB v3Eval 0.80 + KB v3

Golden cases from beta + KB from gaps + 2nd feedback round

Gate 1Gate 1
Gate 2Gate 2
LAUNCHLANZAMIENTO
BUFFERBUFFER

S4 Demo — "It Talks"S4 Demo — "Habla"

User asks a question in the Electron sidebar → ReAct loop processes → 10 READ tool stubs respond → streamed answer with KB context in chat.Usuario hace una pregunta en el sidebar de Electron → loop ReAct procesa → 10 READ tool stubs responden → respuesta con contexto KB en el chat.

S8 Demo — "It Acts"S8 Demo — "Actúa"

User says "change this price" → Shopilot shows preview → user confirms → price updated on MeLi → billing deducted → proactive suggestion appears in conversation.Usuario dice "cambia este precio" → Shopilot muestra preview → usuario confirma → precio actualizado en MeLi → billing descontado → sugerencia proactiva aparece en la conversación.

S10 Demo — "It Ships"S10 Demo — "Se Lanza"

Seller downloads .dmg → installs → connects 3 marketplaces → asks, acts, receives proactive suggestions during the conversation → billing works → production-ready.Vendedor descarga .dmg → instala → conecta 3 marketplaces → pregunta, actúa, recibe sugerencias proactivas durante la conversación → billing funciona → listo para producción.

9.5 Week-by-Week Deliverables Matrix Matriz de Entregables Semana por Semana

Each cell is a concrete, testable deliverable. Bold = demo day deliverable. Color-coded by engineer. Cada celda es un entregable concreto y testeable. Bold = entregable de demo day. Codificado por color por ingeniero.

WeekSem Mateo (CTO) Andres (Data+BE) Sergio (Full-Stack) Pablo (CEO/PE)
0 Pre-Sprint: Technical alignment session (2h, all 4). Pablo: Project #17 CORE bootstrap (OpenClaw + roles + system prompt) Pre-Sprint: Sesión alineación técnica (2h, los 4). Pablo: Bootstrap Proyecto #17 CORE (OpenClaw + roles + system prompt)
1 DynamoDB fix ULID + UserProfile + ILLMClientDynamoDB fix ULID + UserProfile + ILLMClient Scaffold Marketplace Provider + IMarketplaceAdapter + AES256GCMCipher + SellerConnection + IOAuth2Flow + WRITE API docs (MeLi 3, AmazonAds 5, Amazon 2, Shopify 9) + user mgmt provider researchScaffold Marketplace Provider + IMarketplaceAdapter + AES256GCMCipher + SellerConnection + IOAuth2Flow + docs APIs WRITE (MeLi 3, AmazonAds 5, Amazon 2, Shopify 9) + investigación proveedor gestor usuarios Electron scaffold + WebContentsView + MarketplaceDetector + Auth Memberstack + canary build (sem 1, sin Figma)Scaffold Electron + WebContentsView + MarketplaceDetector + Auth Memberstack + canary build (sem 1, sin Figma) Eval Setup + golden dataset 15-20 cases + brand registration + Apple/Win auth. #18 UX/UI: T0.BB Brand book + Foundations Figma (approves end wk1)Eval Setup + golden dataset 15-20 casos + registro marca + auth Apple/Win. #18 UX/UI: T0.BB Brand book + Foundations Figma (aprueba fin sem1)
2 AgentLoopOrchestrator ReAct + RestResponseEventEmitter + verify ObservabilityAgentLoopOrchestrator ReAct + RestResponseEventEmitter + verificar Observability Amazon scaffold + MeLiOAuth2Flow + Terraform GCP verify + external deps + MarketplaceAction entityScaffold Amazon + MeLiOAuth2Flow + Terraform GCP verify + deps externas + entidad MarketplaceAction Tabs + Sidebar 2.5d (con tokens T0.BB) + Mockup shell container (T1.MK1)Tabs + Sidebar 2.5d (con tokens T0.BB) + Mockup shell container (T1.MK1) KB: 15-20 docs + 10 READ tool specs. #18 UX/UI: T1.BB Atoms + Molecules + Chat organisms (approves end wk2)KB: 15-20 docs + 10 specs tools READ. #18 UX/UI: T1.BB Atoms + Molecules + Organismos chat (aprueba fin sem2)
3 Tool definitions (ToolDefinition class + HookLifecycle)Definiciones de tools (clase ToolDefinition + HookLifecycle) ShopifyOAuth2Flow + ShopifyAdapter + Data Sync Clean Arch refactorShopifyOAuth2Flow + ShopifyAdapter + refactor Clean Arch Data Sync Chat UI 2.5d (T1.BB components) + WebSocket client + URL context injectionChat UI 2.5d (componentes T1.BB) + WebSocket client + inyección contexto URL KB incremental + batch embeddings + Eval LLM Judge + EvalRunner. #18 UX/UI: T2.BB Molecules + Organisms (delivery S3-4)KB procesamiento incremental + batch embeddings + Eval LLM Judge + EvalRunner. #18 UX/UI: T2.BB Molecules + Organismos (entrega S3-4)
4 10 READ stubs + WRITE stubs + SYSTEM tool + IContextAssembler + Health summary + prompt caching10 stubs READ + stubs WRITE + tool SYSTEM + IContextAssembler + Health summary + prompt caching AmazonAdapter complete (if E1) + TokenRefreshCron + CDK base AWS + CI multi-repo + AmazonAdsOAuth + ISKUResolverAmazonAdapter completo (si E1) + TokenRefreshCron + CDK base AWS + CI multi-repo + AmazonAdsOAuth + ISKUResolver OnboardingWizard 2.5d (T1.BB) + react-router views + MK1 ChatView + MK2 OnboardingWizard + Gate 1 signed buildOnboardingWizard 2.5d (T1.BB) + vistas react-router + MK1 ChatView + MK2 OnboardingWizard + build firmado Gate 1 E2E testing Playground + bootstrap ~150 tasks Linear + Quality gate 5-step Beautonomous. #18 approves T2.BBTesting E2E Playground + bootstrap ~150 tareas Linear + Quality gate 5-step Beautonomous. #18 aprueba T2.BB
GATE 1 — "It Talks"GATE 1 — "Habla"
5 10 real READ handlers + ConfirmationFlow + InputGuard + HttpCreditGate10 handlers READ reales + ConfirmationFlow + InputGuard + HttpCreditGate Fast Data Layer 11 endpoints + GCS snapshots + DAG AmazonFast Data Layer 11 endpoints + snapshots GCS + DAG Amazon BillingView 2.5d (T2.BB) + ProfileView (T2.BB) + Stripe Checkout + billing backendBillingView 2.5d (T2.BB) + ProfileView (T2.BB) + Stripe Checkout + billing backend KB BigQuery indexing + Eval CI + golden dataset 50. #18 UX/UI: T3.BB Advanced Organisms (delivery S5-6)KB Indexación BigQuery + Eval CI + golden dataset 50. #18 UX/UI: T3.BB Organismos avanzados (entrega S5-6)
6 4 WRITE tools + ProactiveSuggestionService + Enrichment scaffold + MeLi market intelligence + VisionLLM + 8 ANALYSIS handlers4 tools WRITE + ProactiveSuggestionService + scaffold Enrichment + MeLi market intelligence + VisionLLM + 8 handlers ANALYSIS IRateLimiter per marketplace + onboarding trigger + CI/CD 11 reposIRateLimiter por marketplace + onboarding trigger + CI/CD 11 repos Confirmation dialogs (T2.BB) + suggestion cards 1.5d (T2.BB) + MK1 BillingView + MK2 ProfileView + MK3 ConfirmDialogDiálogos confirmación (T2.BB) + cards sugerencias 1.5d (T2.BB) + MK1 BillingView + MK2 ProfileView + MK3 ConfirmDialog QA conversation flows 3 marketplaces + golden dataset edge cases. #18 approves T3.BBQA flujos conversación 3 marketplaces + golden dataset edge cases. #18 aprueba T3.BB
7 WebSocket streaming + SystemPromptComposer L3 + OutputGuard + FeedbackCapture in HookLifecycleWebSocket streaming + SystemPromptComposer L3 + OutputGuard + FeedbackCapture en HookLifecycle Load testing (50 users) + staging deployLoad testing (50 usuarios) + deploy staging WS client 2.5d (T3.BB: ReActStream + RollbackPanel) + EnrollmentView + Sentry + Feedback Loop scaffoldWS client 2.5d (T3.BB: ReActStream + RollbackPanel) + EnrollmentView + Sentry + scaffold Feedback Loop Proactive suggestions testing + KB batch v2 + Eval automated CI. #18 UX/UI: T4.BB Quality Audit (delivery S7-8)Testing sugerencias proactivas + KB batch v2 + Eval automatizado CI. #18 UX/UI: T4.BB Auditoría calidad (entrega S7-8)
8 Remaining WRITE tools (circuit breaker) + ActionLog entity + p95 optimizationWRITE tools restantes (circuit breaker) + entidad ActionLog + optimización p95 #12 + #10 integration tested + CloudWatch dashboard + PagerDuty alerts + WebSocket CDK + Silver/Gold circuit breaker#12 + #10 integración testeada + dashboard CloudWatch + alertas PagerDuty + WebSocket CDK + circuit breaker Silver/Gold FeedbackMeasurer + FeedbackGate + explicit/implicit + grace 7d + MK1 EnrollmentView + MK2 flujo WRITE + Gate 2 buildFeedbackMeasurer + FeedbackGate + explicit/implicit + grace 7d + MK1 EnrollmentView + MK2 flujo WRITE + build Gate 2 Contract testing + KB quality eval + beta selection + onboarding prep. #18 approves T4.BBContract testing + eval calidad KB + selección beta + prep onboarding. #18 aprueba T4.BB
GATE 2 — "It Acts"GATE 2 — "Actúa"
9 Bug fixes + agent quality tuningFix bugs + tuning calidad agente Prod deploy (Lambda CDK) + RDS backupsDeploy prod (Lambda CDK) + backups RDS Code signing + .dmg + bug fixes 3.5d (post T4.BB audit) + Billing Stripe liveCode signing + .dmg + bug fixes 3.5d (post auditoría T4.BB) + Billing Stripe live Beta onboarding (10-15 sellers)Onboarding beta (10-15 vendedores)
10 System Prompt v3 final + P1/P2 intelligence bug fixesSystem Prompt v3 final + bug fixes P1/P2 inteligencia #14 CDK + Terraform + SSL + domain + rollback testing + Data Sync OpenMetadata#14 CDK + Terraform + SSL + dominio + rollback testing + Data Sync OpenMetadata Security hardening + telemetry + MK1 Dashboard viewHardening seguridad + telemetría + MK1 Dashboard view Feedback calls + security review + E2E eval pipeline + Go/No-Go checklistCalls feedback + review seguridad + pipeline E2E eval + checklist Go/No-Go
LAUNCH GATE — "It Ships"GATE LANZAMIENTO — "Se Lanza"

9.6 Engineer Deep Dive — 4 Tracks Detalle por Ingeniero — 4 Tracks

9.6.1 — Mateo Quintero — CTO

Mateo Quintero — CTO

Orchestration + Tools + Intelligence + Knowledge Base + Enrichment + ObservabilityOrquestacion + Tools + Inteligencia + Knowledge Base + Enrichment + Observabilidad

Projects #2, #3, #4, #5, #6, #7, #8, #9, #11, #18Proyectos #2, #3, #4, #5, #6, #7, #8, #9, #11, #18

Mateo owns the AI brain of Shopilot. He builds the ReAct orchestrator (#2), the tool registry (#3), the personality engine (#4), the context aggregator (#5), the proactive suggestion engine (#6), the guardrails layer (#7), the observability system (#8), the Cerebro Knowledge Base (#9, Go 1.24 + Vertex AI + BigQuery vectors), and the enrichment layer (#11) for competitive analysis tools. Mateo es dueño del cerebro de IA de Shopilot. Construye el orquestador ReAct (#2), el tool registry (#3), el motor de personalidad (#4), el context aggregator (#5), el motor de sugerencias proactivas (#6), la capa de guardrails (#7), el sistema de observabilidad (#8), la Cerebro Knowledge Base (#9, Go 1.24 + Vertex AI + vectores BigQuery), y la capa de enrichment (#11) para tools de análisis competitivo.

Sprint 1-2 — ReAct Loop + REST + DynamoDB FixSprint 1-2 — Loop ReAct + REST + Fix DynamoDB

Goal: A working orchestrator that receives a user message, calls Claude with tools, executes tool calls, and returns the full response via REST (WebSocket upgrade is T4.1 in S7-8).Objetivo: Un orquestador funcional que recibe un mensaje de usuario, llama a Claude con tools, ejecuta tool calls, y retorna la respuesta completa via REST (el upgrade a WebSocket es T4.1 en S7-8).

T1.1 — DynamoDB schema fix: SK UUID → ULID (time-sortable), findByMessageId O(n) → SK Trace#{messageId}, remove queryEmbedding (6KB), fix GSIsT1.1 — Fix schema DynamoDB: SK UUID → ULID, findByMessageId O(n) → SK Trace#{messageId}, eliminar queryEmbedding (6KB), fix GSIs

T1.2 — UserProfile entity: pk User#{userId}, sk ProfileT1.2 — Entidad UserProfile: pk User#{userId}, sk Profile

T1.3 — Conversation history in prompt: last N messages, findWindowForPrompt, token budget 200KT1.3 — Historial en el prompt: últimos N mensajes en prompt, findWindowForPrompt, token budget 200K

T1.4 — ILLMClient update: chat() accepts toolDefinitions, returns ContentBlock[]T1.4 — Actualizar ILLMClient: chat() acepta toolDefinitions, retorna ContentBlock[]

T1.5 — SystemPromptComposer L1+L2: base identity (cached) + session (UserProfile + alerts)T1.5 — SystemPromptComposer L1+L2: identidad base (cached) + sesión (UserProfile + alertas)

T1.6 — Implement AgentLoop (ReAct): user_message → LLM (with tools) → tool_use? → execute → observe → repeat. MAX_ROUNDS=10, cost guard 50K tokensT1.6 — Implementar AgentLoop (ReAct): user_message → LLM (con tools) → tool_use? → ejecutar → observar → repetir. MAX_ROUNDS=10, cost guard 50K tokens

T1.7 — RestResponseEventEmitter: full response post-rounds, no streamingT1.7 — RestResponseEventEmitter: respuesta completa post-rondas, sin streaming

T1.8 — Verify Observability with ReAct: ConversationTrace + AgentTracking compatible with multi-step loopT1.8 — Verificar Observability con ReAct: ConversationTrace + AgentTracking compatibles con loop multi-step

Unit tests for the loop (mock Claude responses, MAX_ROUNDS cutoff)Tests unitarios del loop (mock de respuestas Claude, corte MAX_ROUNDS)

T1.21 — KB Phase 0 Fix duplicates: TRUNCATE before embed, embedded_at timestamp, CI Go 1.21→1.24T1.21 — KB Fase 0 Fix duplicados: TRUNCATE antes de embed, timestamp embedded_at, CI Go 1.21→1.24

T1.22 — KB Phase 1 Contextual Retrieval: contextual prefix per chunk, Markdown section chunking, 150-char overlapT1.22 — KB Fase 1 Contextual Retrieval: prefijo contextual por chunk, chunking por secciones Markdown, overlap 150 chars

T1.23 — KB content: 15-20 curated docs — MeLi best practices, Amazon policies, Shopify guidelines, pricing, photos, metrics, seller FAQT1.23 — Contenido KB: 15-20 docs curados — mejores prácticas MeLi, políticas Amazon, guías Shopify, pricing, fotos, métricas, FAQ vendedores

T1.25 — 10 READ tool specs: name, LLM description, input_schema JSON Schema, risk level, credit cost per toolT1.25 — 10 specs tools READ: nombre, descripción LLM, JSON Schema input_schema, nivel riesgo, credit cost por tool

Dependencies: None — Mateo starts first. Sergio depends on HTTP endpoint being stable by end of W2.Dependencias: Ninguna — Mateo arranca primero. Sergio depende de endpoint HTTP estable para final de S2.

Deliverable: POST /conversation → send message → ReAct loop processes with multi-turn history → full REST response.Entregable: POST /conversation → enviar mensaje → loop ReAct procesa con historial multi-turno → respuesta REST completa.

Sprint 3-4 — Tool Registry + Context + StubsSprint 3-4 — Tool Registry + Contexto + Stubs

Goal: ToolRegistry with 10 READ stubs + 17 WRITE stubs + 1 SYSTEM tool registered. Policy filtering. HookLifecycle.Objetivo: ToolRegistry con 10 READ stubs + 17 WRITE stubs + 1 SYSTEM tool registrados. Filtrado de políticas. HookLifecycle.

Define tools as Anthropic tool_use — name, description, input_schema (JSON Schema)Definir tools como Anthropic tool_use — name, description, input_schema (JSON Schema)

T2.2 — IToolExecutor + ToolExecutor: execute(toolName, args, context) → ToolResult. T2.3 — ToolPolicyFilter: risk gate + marketplace gate. T2.4 — HookLifecycle: before_tool → execute → after_toolT2.2 — IToolExecutor + ToolExecutor: execute(toolName, args, context) → ToolResult. T2.3 — ToolPolicyFilter: risk gate + marketplace gate. T2.4 — HookLifecycle: before_tool → execute → after_tool

Policy filtering: Free users — READ only; Pro users — READ + WRITE + ANALYSIS toolsFiltrado de políticas: usuarios Free — solo READ; Pro — READ + WRITE + ANALYSIS tools

READ tools (10): get_product, get_product_metrics, get_orders, get_buyer_questions, get_product_reviews, get_category_requirements, get_campaigns, get_campaign_metrics, get_store, get_store_metricsTools READ (10 stubs): get_product, get_product_metrics, get_orders, get_buyer_questions, get_product_reviews, get_category_requirements, get_campaigns, get_campaign_metrics, get_store, get_store_metrics

T2.5a — ToolResult domain model. T2.5b — update_user_profile SYSTEM tool. T2.5c — contextSummary. T2.5d — 17 WRITE tool stubs registered (no real execution in S3-4)T2.5a — Modelo de dominio ToolResult. T2.5b — SYSTEM tool update_user_profile. T2.5c — contextSummary. T2.5d — 17 stubs WRITE registrados (sin ejecución real en S3-4)

T2.6 — IContextAssembler: KB + Brand Health RAG in parallel. T2.7 — structured health summary injected in system promptT2.6 — IContextAssembler: KB + Brand Health RAG en paralelo. T2.7 — resumen de salud estructurado inyectado en system prompt

T2.8 — Anthropic prompt caching. T2.9 — Tool result in-memory cachingT2.8 — Prompt caching Anthropic. T2.9 — Tool result caching en memoria

Integration tests: question → tool call → stub handler → result → responseTests de integración: pregunta → tool call → stub handler → resultado → respuesta

T2.22 — KB Phase 2 Incremental processing: content hash SHA-256, is_current flag, only re-embed docs that changedT2.22 — KB Fase 2 Procesamiento incremental: content hash SHA-256, flag is_current, solo re-embeder docs que cambiaron

T2.23 — KB Phase 3 Batch embeddings: up to 250 texts per Vertex AI call, goroutine pool max 5T2.23 — KB Fase 3 Batch embeddings: hasta 250 textos por llamada Vertex AI, goroutine pool max 5

Dependencies: T1.25 tool specs (own task). Handlers are stubs in S3-4 — do not depend on real adapters.Dependencias: T1.25 tool specs (tarea propia). Los handlers son stubs en S3-4 — no dependen de adaptadores reales.

Deliverable: "¿How are my metrics?" → Claude calls get_product_metrics → stub responds mock data → response with KB context.Entregable: "¿Cómo van mis métricas?" → Claude llama get_product_metrics → stub responde datos mock → respuesta con contexto KB.

Sprint 5-6 — WRITE Tools + Guardrails + Credits Gate + Enrichment + Billing + Token PipelineSprint 5-6 — WRITE Tools + Guardrails + Credits Gate + Enrichment + Billing + Token Pipeline

Goal: First WRITE tools working E2E with confirmation flow, InputGuard pre-LLM, HttpCreditGate, ProactiveEvaluator afterTool hook, and full Enrichment layer (8 ANALYSIS tools).Objetivo: Primeros tools WRITE funcionando E2E con flujo de confirmación, InputGuard pre-LLM, HttpCreditGate, hook ProactiveEvaluator afterTool, y capa Enrichment completa (8 tools ANALYSIS).

T3.1 — 10 READ handlers connected to real Fast Data Layer or Marketplace Provider (replaces stubs from S3-4)T3.1 — 10 READ handlers reales conectados a Fast Data Layer o Marketplace Provider (reemplaza stubs de S3-4)

T3.2 — ConfirmationFlow: when risk > read-only → pause loop → send preview → persist OrchestrationSession (DynamoDB, TTL 35min) → resume on confirmT3.2 — ConfirmationFlow: cuando riesgo > lectura → pausar loop → enviar preview → persistir OrchestrationSession (TTL 35min) → resumir al confirmar

WRITE tools (phase 1): update_product_content, update_price, pause_product, activate_product — for all 3 marketplacesTools WRITE (fase 1): update_product_content, update_price, pause_product, activate_product — para los 3 marketplaces

T3.5 — IGuardService + InputGuard: pattern matching + out-of-scope filtering pre-LLMT3.5 — IGuardService + InputGuard: pattern matching + filtrado fuera de scope pre-LLM

T3.5a — HttpCreditGate: POST /internal/gate before each tool callT3.5a — HttpCreditGate: POST /internal/gate antes de cada tool call

T3.14 — GCS pre-write snapshots for ConfirmationFlow (Andres provides endpoint)T3.14 — Snapshots GCS pre-write para ConfirmationFlow (Andres provee endpoint)

T3.4 — ProactiveSuggestionService via afterTool hook: LLM inference post-tool — output: { hasSuggestion, message, suggestionType, priority, productId }T3.4 — ProactiveSuggestionService via hook afterTool: inferencia LLM post-tool — output: { hasSuggestion, message, suggestionType, priority, productId }

T3.25 — KB BigQuery indexing: index 15-20 docs via Go pipeline, verify top-5 semantic search for 5 test queriesT3.25 — KB Indexación BigQuery: indexar 15-20 docs via pipeline Go, verificar top-5 semantic search para 5 queries de prueba

T3.6–T3.11 — Enrichment complete: scaffold + MeLi market intelligence + Vision LLM + Redis cache + CDK + 8 ANALYSIS handlers (search_market_products, get_competitor_product, get_market_pricing, get_keyword_data, analyze_product_image, enhance_product_image, analyze_product_video, get_product_fee_estimate)T3.6–T3.11 — Enrichment completo: scaffold + MeLi market intelligence + Vision LLM + Redis cache + CDK + 8 ANALYSIS handlers

#18 Design System

T3.32 — Token pipeline + Style Dictionary: Figma Variables → design-tokens.json → Style Dictionary build → CSS :root + tailwind.config.ts. CI validates token file on each PRT3.32 — Token pipeline + Style Dictionary: Figma Variables → design-tokens.json → build Style Dictionary → CSS :root + tailwind.config.ts. CI valida archivo de tokens en cada PR

Dependencies: Sergio's confirmation UI (S5-6) for the confirmation flow UX. Andres's Fast Data Layer for T3.1 real handlers.Dependencias: UI de confirmación de Sergio (S5-6) para flujo de confirmación. Fast Data Layer de Andres para T3.1 handlers reales.

Sprint 7-8 — Streaming + Proactivo + OutputGuard + ActionLog + FeedbackCaptureSprint 7-8 — Streaming + Proactivo + OutputGuard + ActionLog + FeedbackCapture

Goal: WebSocket streaming (T4.1), SystemPromptComposer L3 with WRITE guardrails, OutputGuard, ActionLog, FeedbackCapture, remaining WRITE tools, and p95 <3s optimization.Objetivo: Streaming WebSocket (T4.1), SystemPromptComposer L3 con guardrails WRITE, OutputGuard, ActionLog, FeedbackCapture, WRITE tools restantes, y optimización p95 <3s.

T4.1 — WebSocket streaming: 8 server events, 4 client events (API Gateway WebSocket → Lambda → Electron WS client)T4.1 — WebSocket streaming: 8 server events, 4 client events (API Gateway WebSocket → Lambda → cliente WS Electron)

T4.2 — SystemPromptComposer L3: conditional WRITE guardrails. Hard cap 1200 tokensT4.2 — SystemPromptComposer L3: guardrails WRITE condicionales. Hard cap 1200 tokens

T4.3 — OutputGuard: cross-user data leak prevention + dangerous content filteringT4.3 — OutputGuard: prevención de fuga de datos cross-usuario + filtrado de contenido peligroso

T4.4 — Remaining WRITE tools (circuit breaker): update_product_images, update_stock, publish_product, answer_question, etc.T4.4 — WRITE tools restantes (circuit breaker): update_product_images, update_stock, publish_product, answer_question, etc.

T4.16 — KB batch + v2: batch Vertex AI embeddings 250/call, target >80% hit rate on 20 eval queriesT4.16 — KB batch + v2: batch embeddings Vertex AI 250/llamada, target >80% hit rate en 20 queries eval

T4.5 — Performance optimization: target p95 <3sT4.5 — Optimización de performance: target p95 <3s

T4.5a — FeedbackCapture in HookLifecycle: after_tool writes FeedbackEntry via HTTP to #15. Fire-and-forget.T4.5a — FeedbackCapture en HookLifecycle: after_tool escribe FeedbackEntry via HTTP a #15. Fire-and-forget.

T4.5b — ActionLog entity + DynamoActionLogRepository: record of every WRITE executedT4.5b — Entidad ActionLog + DynamoActionLogRepository: registro de cada WRITE ejecutada

Dependencies: T4.1 WS server must be ready before Sergio builds T4.19 WS Electron client. T4.9a — API Gateway WebSocket CDK from Andres must be ready. All WRITE handlers stable from S5-6.Dependencias: Servidor WS T4.1 listo antes de que Sergio construya cliente WS Electron T4.19. T4.9a — API Gateway WebSocket CDK de Andres debe estar listo. Handlers WRITE estables desde S5-6.

Sprint 9-10 — Bug Fixes + Monitoring + Security SupportSprint 9-10 — Fix Bugs + Monitoreo + Soporte Seguridad

Goal: Production-stable agent. Security review support. System prompt v3 from real conversations. All P1/P2 intelligence bugs resolved.Objetivo: Agente estable en producción. Soporte en security review. System prompt v3 basado en conversaciones reales. Todos los bugs P1/P2 de inteligencia resueltos.

LLMGuardChecker: Claude Haiku as classifier for ambiguous inputs (Phase 2 of InputGuard)LLMGuardChecker: Claude Haiku como clasificador para inputs ambiguos (Phase 2 de InputGuard)

Bug fixes P1/P2 across the intelligence stack from beta feedbackBug fixes P1/P2 en todo el stack de inteligencia basado en feedback de beta

System prompt v3 final: iteration based on real beta conversations, adjusted few-shot examplesSystem prompt v3 final: iteración con conversaciones reales de beta, ejemplos few-shot ajustados

Security review support: OWASP top 10, injection path review, OutputGuard validationSoporte security review: OWASP top 10, revisión de paths de inyección, validación OutputGuard

Sprint 11-12 — Buffer: Intelligence Hardening + Deferred WRITE ToolsSprint 11-12 — Buffer: Hardening Inteligencia + WRITE Tools Diferidos

Goal: Clear intelligence P1/P2 backlog. Ship any WRITE tools cut by circuit breaker (advertising campaigns). p95 optimization if >3s. ProactiveSuggestions v2 if deferred.Objetivo: Limpiar backlog P1/P2 de inteligencia. Lanzar WRITE tools cortadas por circuit breaker (advertising campaigns). Optimización p95 si >3s. ProactiveSuggestions v2 si fue diferido.

Bug fixes P1/P2 for intelligence reported by beta usersBug fixes P1/P2 de inteligencia reportados por usuarios de beta

WRITE tools cut by circuit breaker (advertising campaigns if not in S7-S8)WRITE tools cortadas por circuit breaker (advertising campaigns si no entraron en S7-S8)

p95 optimization if >3s: profiling hot paths, DynamoDB query optimization, prompt size reductionOptimización p95 si >3s: profiling hot paths, optimización queries DynamoDB, reducción tamaño prompt

ProactiveSuggestions v2 (if deferred): afterToolWithContext() parallel to streaming, gate <40% turnsProactiveSuggestions v2 (si fue diferido): afterToolWithContext() paralelo al streaming, gate <40% turnos

Circuit breaker output: Advertising WRITE tools + ProactiveSuggestions v2 are the most likely candidates to be cut from S7-8.Output del circuit breaker: WRITE tools de advertising + ProactiveSuggestions v2 son los candidatos más probables a ser cortados de S7-8.

Key Technical DecisionsDecisiones Técnicas Clave

Claude Sonnet 4 as primary LLM — tool_use native, fast, cost-effectiveClaude Sonnet 4 como LLM primario — tool_use nativo, rapido, costo-efectivo

ToolRegistry with register(def, handler) / registerRemote(def, dispatcher). HookLifecycle: before_tool → execute → after_tool.ToolRegistry con register(def, handler) / registerRemote(def, dispatcher). HookLifecycle: before_tool → execute → after_tool.

All 3 marketplace adapters (MeLi, Amazon, Shopify) owned by Andres (#12) — single owner policy [CORREGIDO: Shopify de vuelta a Andrés]Los 3 adaptadores de marketplace (MeLi, Amazon, Shopify) a cargo de Andres (#12) — politica de propietario único [CORREGIDO: Shopify de vuelta a Andrés]

9.6.2 — Andrés León — Data + Backend

Andres Leon — Data + Backend

APIs + Data Sync + Auth + InfrastructureAPIs + Data Sync + Auth + Infraestructura

Projects #10, #12, #14Proyectos #10, #12, #14

Andres owns the data backbone. He builds the marketplace adapters for MeLi + Amazon + Shopify (#12), the data sync pipelines (#10), auth/token management — including SellerConnection (5-state machine), MarketplaceAction (action log), and IOAuth2Flow (generic OAuth2 port) — and the DevOps infrastructure (#14). Andrés es dueño del backbone de datos. Construye los adaptadores de marketplace MeLi + Amazon + Shopify (#12), los pipelines de data sync (#10), el manejo de auth/tokens — incluyendo SellerConnection (state machine 5 estados), MarketplaceAction (registro de acciones) e IOAuth2Flow (puerto genérico OAuth2) — y la infraestructura DevOps (#14).

Sprint 1-2 — Marketplace Adapters + OAuth2 + Domain EntitiesSprint 1-2 — Adaptadores + OAuth2 + Entidades de Dominio

Goal: MeLi and Amazon scaffold adapters returning data via IMarketplaceAdapter, OAuth2 flows for MeLi + Amazon LWA, AES-256-GCM token encryption, domain entities SellerConnection and MarketplaceAction.Objetivo: Adaptadores MeLi y Amazon scaffold retornando datos via IMarketplaceAdapter, flujos OAuth2 MeLi + Amazon LWA, cifrado tokens AES-256-GCM, entidades de dominio SellerConnection y MarketplaceAction.

T1.9 — Scaffold Marketplace Provider: Clean Architecture + DDD, Value Objects, Error types, DI containerT1.9 — Scaffold Marketplace Provider: Clean Architecture + DDD, Value Objects, tipos de Error, DI container

T1.10 — IMarketplaceAdapter: 23 methods, 4 domains (Catalog, Engagement, Advertising, Enrollment). ISKUResolver: SKU → native marketplace IDT1.10 — IMarketplaceAdapter: 23 métodos, 4 dominios (Catalog, Engagement, Advertising, Enrollment). ISKUResolver: SKU → ID nativo marketplace

T1.11 — AES256GCMCipher + ITokenManager: encrypt tokens at rest, DynamoDB marketplace-credentials tableT1.11 — AES256GCMCipher + ITokenManager: cifrado tokens at rest, tabla DynamoDB marketplace-credentials

T1.12 — MeLiOAuth2Flow + MeLiAdapter: OAuth2 code flow, REST API, standardized error mappingT1.12 — MeLiOAuth2Flow + MeLiAdapter: OAuth2 code flow, REST API, mapeo errores estandarizados

T1.13 — AmazonLWAFlow + AmazonAdapter scaffold: OAuth2 LWA, SP-API SDK. Scaffold only — full impl in S3-4T1.13 — AmazonLWAFlow + AmazonAdapter scaffold: OAuth2 LWA, SP-API SDK. Solo scaffold — impl completa en S3-4

T1.14 — Verify existing Terraform GCP: GCS, Cloud Run, Airflow, BigQuery operationalT1.14 — Verificar Terraform GCP existente: GCS, Cloud Run, Airflow, BigQuery operacionales

T1.15 — Request external dependencies: Amazon SP-API, MeLi dev portal, Shopify Partners, Apple DeveloperT1.15 — Solicitar dependencias externas: Amazon SP-API, MeLi dev portal, Shopify Partners, Apple Developer

T1.15a — SellerConnection aggregate: 5-state machine (disconnected → pending → active → expired → revoked)T1.15a — Aggregate SellerConnection: state machine 5 estados (disconnected → pending → active → expired → revoked)

T1.15b — MarketplaceAction entity + IMarketplaceActionRepositoryT1.15b — Entidad MarketplaceAction + IMarketplaceActionRepository

T1.15c — IOAuth2Flow interface: generic OAuth2 port (authorize, exchangeCode, refreshToken)T1.15c — Interfaz IOAuth2Flow: puerto genérico OAuth2 (authorize, exchangeCode, refreshToken)

T1.28 — Collect missing WRITE API docs: MeLi 3, Amazon Ads 5, Amazon 2, Shopify 9 — required for #3 Tool Registry WRITE action mappingT1.28 — Recolectar docs APIs WRITE faltantes: MeLi 3, Amazon Ads 5, Amazon 2, Shopify 9 — necesario para mapeo acciones WRITE de #3 Tool Registry

T1.29 — Collect user management provider docs: evaluate external auth provider (Auth0, Clerk, Memberstack), document service methods for consumer layersT1.29 — Recolectar docs gestor de usuarios: evaluar proveedor auth externo (Auth0, Clerk, Memberstack), documentar métodos de servicio para capas consumidoras

Build PipelinePipeline de Build

T1.33 — GitHub Actions CI: electron-builder on release/* branch. Upload .dmg + .exe artifacts. Notify #deploys SlackT1.33 — GitHub Actions CI: electron-builder en rama release/*. Subir artifacts .dmg + .exe. Notificar Slack #deploys

Dependencies: None — Andres starts in parallel with Mateo. T1.33 depends on Sergio’s T1.32 canary build.Dependencias: Ninguna — Andrés arranca en paralelo con Mateo. T1.33 depende del build canary T1.32 de Sergio.

Deliverable: MeLiAdapter returns real data via IMarketplaceAdapter. AmazonAdapter scaffold ready. Tokens encrypted AES-256-GCM with auto-refresh.Entregable: MeLiAdapter retorna datos reales via IMarketplaceAdapter. Scaffold AmazonAdapter listo. Tokens cifrados AES-256-GCM con refresh automático.

Sprint 3-4 — Shopify + Amazon + TokenRefreshCron + Data Sync + CDK + CISprint 3-4 — Shopify + Amazon + TokenRefreshCron + Data Sync + CDK + CI

Goal: Shopify adapter complete, Amazon adapter complete (if E1 approved), TokenRefreshCron, Data Sync Clean Architecture, CDK base AWS, CI multi-repo.Objetivo: Shopify adapter completo, Amazon adapter completo (si E1 aprobado), TokenRefreshCron, Data Sync con Clean Architecture, CDK base AWS, CI multi-repo.

T2.10 — ShopifyOAuth2Flow + ShopifyAdapter: OAuth2, GraphQL Admin API, cost-based rate limitingT2.10 — ShopifyOAuth2Flow + ShopifyAdapter: OAuth2, GraphQL Admin API, rate limiting cost-based

T2.11 — AmazonAdapter complete (if E1 approved): SP-API SDK, Reports, Catalog Items, OrdersT2.11 — AmazonAdapter completo (si E1 aprobado): SP-API SDK, Reports, Catalog Items, Orders

T2.12 — TokenRefreshCron: EventBridge every 5min, pre-refresh 30min, DynamoDB mutex, 3 failures → Slack alertT2.12 — TokenRefreshCron: EventBridge cada 5min, pre-refresh 30min, mutex DynamoDB, 3 fallos → alerta Slack

T2.13 — Data Sync Phase 0.5: refactor Clean Architecture in services/api/ — no behavior changeT2.13 — Data Sync Fase 0.5: refactor Clean Architecture en services/api/ sin cambio de comportamiento

T2.14 — Verify existing DAGs: MeLi + Shopify @hourly. Fix if neededT2.14 — Verificar DAGs existentes MeLi + Shopify @hourly. Fix si necesario

T2.15 — CDK base AWS: DynamoDB conversation-api (corrected GSI) + Lambda + API Gateway v2 HTTP + VPC + NAT. Marketplace Provider: DynamoDB marketplace-credentials, Secrets Manager, EventBridgeT2.15 — CDK base AWS: DynamoDB conversation-api (GSI corregido) + Lambda + API Gateway v2 HTTP + VPC + NAT. Marketplace Provider: DynamoDB marketplace-credentials, Secrets Manager, EventBridge

T2.16 — GitHub Actions CI multi-repo: lint + type-check + tests on each PR, 4 active reposT2.16 — GitHub Actions CI multi-repo: lint + type-check + tests en cada PR, 4 repos activos

T2.16a — marketplace-actions DynamoDB table in CDK. T2.16b — AmazonAdsOAuth2Flow: separate OAuth2 for Amazon Ads API. T2.16c — ISKUResolver implementations: MeLi (ML prefix), Amazon (ASIN), Shopify (numeric ID)T2.16a — Tabla marketplace-actions en CDK. T2.16b — AmazonAdsOAuth2Flow: OAuth2 separado para Amazon Ads API. T2.16c — implementaciones ISKUResolver: MeLi (prefijo ML), Amazon (ASIN), Shopify (ID numérico)

Dependencies: MeLi adapter from S1-2 stable. E1 Amazon approval determines if T2.11 executes or defers to S5.Dependencias: MeLi adapter de S1-2 estable. E1 Amazon approval determina si T2.11 se ejecuta o difiere a S5.

Sprint 5-6 — Fast Data Layer + GCS Snapshots + DAG Amazon + Rate Limiting + CI/CDSprint 5-6 — Fast Data Layer + Snapshots GCS + DAG Amazon + Rate Limiting + CI/CD

Goal: Fast Data Layer with 11 operational endpoints, GCS snapshots for ConfirmationFlow, Amazon DAG, IRateLimiter per marketplace, onboarding trigger, CI/CD for 11 repos.Objetivo: Fast Data Layer con 11 endpoints operacionales, snapshots GCS para ConfirmationFlow, DAG Amazon, IRateLimiter por marketplace, onboarding trigger, CI/CD 11 repos.

T3.13 — Fast Data Layer 11 endpoints: FastAPI 1:1 with Tool Registry, GCS Parquet via pyarrow, <500msT3.13 — Fast Data Layer 11 endpoints: FastAPI 1:1 con Tool Registry, GCS Parquet via pyarrow, <500ms

T3.14 — GCS pre-write snapshots for ConfirmationFlow + snapshot_cleanup_dagT3.14 — Snapshots GCS pre-write para ConfirmationFlow + snapshot_cleanup_dag

T3.15 — DAG Amazon: IExtractor + ILoader + AmazonAuthManager + AmazonExtractor + AmazonLoaderT3.15 — DAG Amazon: IExtractor + ILoader + AmazonAuthManager + AmazonExtractor + AmazonLoader

T3.16 — IRateLimiter per marketplace: MeLi token bucket 1500/min, Amazon burst/restore, Shopify leaky bucket. Redis counterT3.16 — IRateLimiter por marketplace: MeLi token bucket 1500/min, Amazon burst/restore, Shopify leaky bucket. Contador Redis

T3.17 — Onboarding trigger: first sync post-onboarding when user connects marketplaceT3.17 — Onboarding trigger: primer sync post-onboarding cuando usuario conecta marketplace

T3.18 — CI/CD multi-repo complete: 11 repos with GitHub Actions, auto-deploy to stagingT3.18 — CI/CD multi-repo completado: 11 repos con GitHub Actions, deploy automático staging

Sprint 7-8 — Staging Deploy + Load Test + CloudWatch + WebSocket CDK + Silver/GoldSprint 7-8 — Staging Deploy + Load Test + CloudWatch + WebSocket CDK + Silver/Gold

T4.6 — Staging deploy full stack: CDK AWS + Terraform GCP. Health-check greenT4.6 — Staging deploy full stack: CDK AWS + Terraform GCP. Health-check verde

T4.7 — Load testing 50 users: Artillery/k6, target p95 <2sT4.7 — Load testing 50 usuarios: Artillery/k6, target p95 <2s

T4.8 — CloudWatch dashboard + alerts: PagerDuty p95 >2s, Slack cost >$50/dayT4.8 — Dashboard CloudWatch + alertas: PagerDuty p95 >2s, Slack costo >$50/día

T4.9 — Data Sync Silver + Gold (circuit breaker): INormalizer, SilverNormalizer, IAggregator, Brand Health spikeT4.9 — Data Sync Silver + Gold (circuit breaker): INormalizer, SilverNormalizer, IAggregator, Brand Health spike

T4.9a — API Gateway v2 WebSocket CDK: routes $connect/$disconnect/$default, DynamoDB connection-idsT4.9a — API Gateway v2 WebSocket CDK: routes $connect/$disconnect/$default, DynamoDB connection-ids

#12 + #10 integration testing: marketplace adapters + data sync E2ETesting integración #12 + #10: adaptadores marketplace + data sync E2E

Sprint 9-10 — Production Deploy + IaC + Rollback + OpenMetadataSprint 9-10 — Deploy Producción + IaC + Rollback + OpenMetadata

T5.4 — Production deploy: CDK + Terraform prod, SSL + domain api.shopilot.aiT5.4 — Deploy producción: CDK + Terraform prod, SSL + dominio api.shopilot.ai

T5.5 — IaC production complete: DynamoDB PITR 35d, PostgreSQL RDS backups, GCS lifecycle policiesT5.5 — IaC producción completo: DynamoDB PITR 35d, backups PostgreSQL RDS, lifecycle GCS

T5.6 — Rollback testing: Lambda <1min, Cloud Run <1min, document runbookT5.6 — Rollback testing: Lambda <1min, Cloud Run <1min, documentar runbook

T5.6a — Data Sync Phase 4: OpenMetadata FQNs + embedding DAGs → Cerebro KBT5.6a — Data Sync Fase 4: OpenMetadata FQNs + embedding DAGs → Cerebro KB

Sprint 11-12 — Buffer: Prod Hardening + Adapter Fixes + MonitoringSprint 11-12 — Buffer: Hardening Prod + Fix Adapters + Monitoring

Goal: Harden production based on real traffic data. Fix adapter edge cases found in beta. Expand monitoring dashboards.Objetivo: Hardening de producción con datos reales. Corregir edge cases de adapters encontrados en beta. Expandir dashboards de monitoring.

Production hardening: refine CloudWatch alerts (based on real S9-S10 data), update runbooks, rollback drillsHardening producción: afinar alertas CloudWatch (con datos reales S9-S10), actualizar runbooks, drills de rollback

Fix marketplace adapter bugs from beta: rate limit edge cases, OAuth unexpected states, marketplace API quirksFix bugs de adapters de marketplace de beta: edge cases de rate limits, estados inesperados OAuth, quirks de APIs

Monitoring dashboards expanded: cost breakdown per tool, latency per marketplace, error rate per adapterDashboards de monitoring expandidos: desglose de costo por tool, latencia por marketplace, error rate por adapter

DAG Silver→Gold (if cut in S7-8 by circuit breaker): cross-marketplace normalization completeDAG Silver→Gold (si fue cortado en S7-8 por circuit breaker): normalización cross-marketplace completa

Rate limiter optimization with real production data: adjust thresholds, backoff policiesOptimización de rate limiters con datos reales de producción: ajustar thresholds, backoff policies

Circuit breaker output: DAG Silver→Gold + advanced monitoring were candidates for cut if staging slipped.Output del circuit breaker: DAG Silver→Gold + monitoring avanzado fueron candidatos a corte si staging se retrasaba.

Key Technical DecisionsDecisiones Técnicas Clave

AWS Secrets Manager for backend (#2,#3,#12,#13) / GCP Secret Manager for data services (#9,#10,#11). AES-256-GCM for marketplace tokens in DynamoDBAWS Secrets Manager para backend (#2,#3,#12,#13) / GCP Secret Manager para servicios de datos (#9,#10,#11). AES-256-GCM para tokens marketplace en DynamoDB

Redis (ElastiCache) from S5-6 for rate limiting (#12) and enrichment cache (#11). Cache TTL: 15min-24h by data typeRedis (ElastiCache) desde S5-6 para rate limiting (#12) y cache enrichment (#11). Cache TTL: 15min-24h por tipo de dato

Andres owns #10 Data Sync, #12 Marketplace Provider (MeLi + Amazon + Shopify adapters), and #14 DevOps IaC — single owner policy for all marketplace adaptersAndres es dueno de #10 Data Sync, #12 Marketplace Provider (adaptadores MeLi + Amazon + Shopify), y #14 DevOps IaC — politica de propietario unico para todos los adaptadores de marketplace

9.6.3 — Sergio Murillo — Full-Stack

Sergio Murillo — Full-Stack

Native Shell + UI + Billing + Ship + MockupsShell Nativa + UI + Billing + Ship + Mockups

Projects #1, #13, #15Proyectos #1, #13, #15

Sergio owns everything the user sees and touches. He builds the Electron desktop app with WebContentsView for marketplace browsing, the React sidebar with chat, the billing integration with Stripe, ships the final .dmg, the Feedback Loop (#15) that measures the impact of Coach actions at 7 days, and creates integration Mockups that validate UX/UI’s Figma components in real React context. He is the single-point-of-failure for the native shell — Pablo cross-trains on React/Electron basics by S4 as mitigation. Sergio es dueño de todo lo que el usuario ve y toca. Construye la app de escritorio Electron con WebContentsView para navegar marketplaces, el sidebar React con chat, la integración de billing con Stripe, entrega el .dmg final, el Feedback Loop (#15) que mide el impacto de las acciones del Coach a 7 días, y crea Mockups de integración que validan los componentes Figma del equipo UX/UI en contexto React real. Es el single-point-of-failure del shell nativo — Pablo hace cross-training en básicos React/Electron para S4 como mitigación.

Sprint 1-2 — Electron Shell + WebContentsView + AuthSprint 1-2 — Shell Electron + WebContentsView + Auth

Goal: Working Electron app with WebContentsView loading marketplace URLs, tab system, marketplace detector, sidebar container, and Memberstack auth.Objetivo: App Electron funcional con WebContentsView cargando URLs de marketplace, sistema de tabs, detector de marketplace, contenedor sidebar, y auth Memberstack.

T1.16 — Scaffold Electron + electron-builder: Electron 28+, preload scripts with contextBridge, hot reload devT1.16 — Scaffold Electron + electron-builder: Electron 28+, preload scripts con contextBridge, hot reload dev

T1.18 — MarketplaceDetector: URL patterns MeLi/Amazon/Shopify, detect page type, extract IDs, remote config JSON with local fallbackT1.18 — MarketplaceDetector: patterns URL MeLi/Amazon/Shopify, detectar tipo página, extraer IDs, remote config JSON con fallback local

T1.19 — Tab system + Sidebar container +0.5d setup tokens: marketplace tabs + React sidebar 360px with design tokens from T0.BB, IPC main↔renderer, toggle Cmd+BT1.19 — Sistema de Tabs + Sidebar container +0.5d setup tokens: tabs marketplace + sidebar React 360px con tokens de diseño de T0.BB, IPC main↔renderer, toggle Cmd+B

T1.20 — Auth Memberstack: JWT in electron-store encrypted with OS key, login/logout flow, AuthService in main processT1.20 — Auth Memberstack: JWT en electron-store cifrado con clave del OS, login/logout flow, AuthService en main process

Internal Beta BuildBuild Beta Interno

T1.32 — First .dmg + .exe canary build (unsigned): run electron-builder, verify packaging, team install testT1.32 — Primer build canary .dmg + .exe (sin firmar): ejecutar electron-builder, verificar empaquetado, test de instalación

MockupsMockups

T1.MK1 — Mockup shell container: assemble sidebar + tabs with tokens from T0.BB (0.5d)T1.MK1 — Mockup shell container: ensamble sidebar + tabs con tokens de T0.BB (0.5d)

Dependencies: None — Sergio starts in parallel. REST endpoint from Mateo (T1.7) needed by end of S2. T0.BB (Figma foundations) needed for T1.19 in week 2.Dependencias: Ninguna — Sergio arranca en paralelo. REST endpoint de Mateo (T1.7) necesario para final de S2. T0.BB (Figma foundations) necesario para T1.19 en semana 2.

Week 1 (no Figma): T1.16, T1.17, T1.18, T1.20, T1.32. Week 2 (with T0.BB): T1.19 + T1.MK1.Semana 1 (sin Figma): T1.16, T1.17, T1.18, T1.20, T1.32. Semana 2 (con T0.BB): T1.19 + T1.MK1.

Sprint 3-4 — Chat UI + WebSocket + Context Injection + Onboarding + MockupsSprint 3-4 — Chat UI + WebSocket + Inyección Contexto + Onboarding + Mockups

T2.17 — Chat UI + Markdown rendering +0.5d integration T1.BB components. Total: 2.5d. User/assistant bubbles, thinking/executing/done indicators, syntax highlightingT2.17 — Chat UI + Markdown rendering +0.5d integración componentes T1.BB. Total: 2.5d. Burbujas usuario/asistente, indicadores pensando/ejecutando/listo, syntax highlighting

T2.18 — CoachWebSocketService: WebSocket client in main process, exponential backoff reconnect, heartbeat 30s, REST polling fallbackT2.18 — CoachWebSocketService: WebSocket client en main process, reconexión backoff exponencial, heartbeat 30s, fallback REST polling

T2.19 — URL context injection: MarketplaceDetector → extract marketplace, page type, IDs → metadata with each messageT2.19 — Inyección contexto URL: MarketplaceDetector → extraer marketplace, tipo página, IDs → metadata con cada mensaje

T2.20 — react-router views: /chat, /profile, /billing, /enrollment, /onboarding. Bottom tab barT2.20 — Navegación react-router: /chat, /profile, /billing, /enrollment, /onboarding. Tab bar inferior

T2.21 — OnboardingWizard +0.5d T1.BB components. Total: 2.5d. 5 steps: (1) Welcome, (2) Connect marketplace (OAuth inline), (3) Setup profile, (4) Guided first query, (5) Success + next steps. First launch only (localStorage flag). Skip from step 3T2.21 — OnboardingWizard +0.5d componentes T1.BB. Total: 2.5d. 5 pasos: (1) Bienvenida, (2) Conectar marketplace (OAuth inline), (3) Setup perfil, (4) Primera query guiada, (5) Éxito + próximos pasos. Solo primer launch (flag localStorage). Skip desde paso 3

MockupsMockups

T2.MK1 — Mockup ChatView: MessageBubbles + ContextBar + ChatInputBar + AgentStatusBar assembled in React (1d, depends T1.BB)T2.MK1 — Mockup ChatView: MessageBubbles + ContextBar + ChatInputBar + AgentStatusBar ensamblado en React (1d, depende T1.BB)

T2.MK2 — Mockup OnboardingWizard: 5 navigable steps with OnboardingStep (0.5d, depends T1.BB)T2.MK2 — Mockup OnboardingWizard: 5 pasos navegables con OnboardingStep (0.5d, depende T1.BB)

Internal Beta BuildBuild Beta Interno

T2.40 — Gate 1 signed build: Apple codesign + notarytool .dmg, Windows signed .exe, distribute to team. Gate 1 build milestoneT2.40 — Build firmado Gate 1: Apple codesign + notarytool .dmg, Windows .exe firmado, distribuir al equipo. Hito build Gate 1

Sprint 5-6 — Billing + Lifecycle + Confirmations + Cards + MockupsSprint 5-6 — Billing + Ciclo de Vida + Confirmaciones + Cards + Mockups

T3.19 — BillingView +0.5d T2.BB components. Total: 2.5d. Current plan, remaining credits, usage stats. Uses CreditEconomy + MarketplaceKPI + CreditDisplay. Buttons → Stripe Checkout in system browserT3.19 — BillingView +0.5d componentes T2.BB. Total: 2.5d. Plan actual, créditos restantes, stats uso. Usa CreditEconomy + MarketplaceKPI + CreditDisplay. Botones → Stripe Checkout en navegador del sistema

T3.20 — WRITE confirmation dialogs: depends T2.BB, uses ConfirmDialog REVERSIBLE + IRREVERSIBLE. Red/green diff, 35min timeout with 5min reminderT3.20 — Diálogos confirmación WRITE: depende T2.BB, usa ConfirmDialog REVERSIBLE + IRREVERSIBLE. Diff rojo/verde, timeout 35min con reminder 5min

T3.21 — Suggestion cards + tool progress +0.5d T2.BB components. Total: 1.5d. Uses ProactiveCard. Clickable cards, click opens pre-contextualized conversation, spinner with tool nameT3.21 — Cards sugerencias + progreso tools +0.5d componentes T2.BB. Total: 1.5d. Usa ProactiveCard. Cards clicables, click abre conversación pre-contextualizada, spinner con nombre tool

T3.22 — ProfileView: depends T2.BB, uses EnrollmentCard + Toggle labeled. Connected marketplaces, stats, preferences, settingsT3.22 — ProfileView: depende T2.BB, usa EnrollmentCard + Toggle labeled. Marketplaces conectados, stats, preferencias, settings

T3.23 — Stripe Checkout + Customer Portal: Pro $49/mo checkout, webhooks checkout.session.completedT3.23 — Stripe Checkout + Customer Portal: checkout Pro $49/mes, webhooks checkout.session.completed

T3.24 — ICreditsGate + backend credits: POST /internal/gate, credit matrix READ=1 / ANALYSIS=2 / WRITE=3T3.24 — ICreditsGate + backend créditos: POST /internal/gate, matriz READ=1 / ANALYSIS=2 / WRITE=3

T3.24a — Billing schema migration: ALTER TABLE clients + tables credit_packs, subscription_events, credit_transactionsT3.24a — Billing schema migration: ALTER TABLE clients + tablas credit_packs, subscription_events, credit_transactions

T3.24b — SubscriptionLifecycleService: activate, cancel, upgrade, downgrade, grace period 7dT3.24b — SubscriptionLifecycleService: activate, cancel, upgrade, downgrade, grace period 7d

T3.24c — Monthly credit reset cron: EventBridge + Lambda, reset plan credits monthlyT3.24c — Cron reset créditos mensual: EventBridge + Lambda, reset créditos de plan mensualmente

MockupsMockups

T3.MK1 — Mockup BillingView: current plan + CreditEconomy + ProgressBar labeled + CreditDisplay + Stripe buttons (0.5d, depends T2.BB)T3.MK1 — Mockup BillingView: plan actual + CreditEconomy + ProgressBar labeled + CreditDisplay + botones Stripe (0.5d, depende T2.BB)

T3.MK2 — Mockup ProfileView: EnrollmentCard list + Toggles labeled + InputFields (0.5d, depends T2.BB)T3.MK2 — Mockup ProfileView: lista EnrollmentCards + Toggles labeled + InputFields (0.5d, depende T2.BB)

T3.MK3 — Mockup ConfirmDialog in chat context: ChatView + ConfirmDialog overlay (slide up + backdrop) × REVERSIBLE and IRREVERSIBLE (0.5d, depends T2.BB + T3.20)T3.MK3 — Mockup ConfirmDialog en contexto de chat: ChatView + ConfirmDialog superpuesto (slide up + backdrop) × REVERSIBLE e IRREVERSIBLE (0.5d, depende T2.BB + T3.20)

Sprint 7-8 — WebSocket Client + EnrollmentView + Sentry + Feedback Loop + MockupsSprint 7-8 — WebSocket Client + EnrollmentView + Sentry + Feedback Loop + Mockups

T4.10 — WebSocket client progressive +0.5d T3.BB components for ReActStream + RollbackPanel. Total: 2.5d. Consume 8 server→client events (tool_start, tool_end, token_stream, suggestion_ready, confirmation_required, credits_updated, feedback_request, session_expired), update UI state machine accordinglyT4.10 — WebSocket client progresivo +0.5d componentes T3.BB para ReActStream + RollbackPanel. Total: 2.5d. Consumir 8 eventos server→client (tool_start, tool_end, token_stream, suggestion_ready, confirmation_required, credits_updated, feedback_request, session_expired), actualizar máquina de estados UI

T4.11 — EnrollmentView standalone: dedicated BrowserWindow for OAuth redirect flows per marketplace. Reuses OAuth tokens from T2.21 OnboardingWizardT4.11 — EnrollmentView standalone: BrowserWindow dedicado para flujos OAuth redirect por marketplace. Reutiliza tokens OAuth de T2.21 OnboardingWizard

T4.12 — Sentry crash reporting: init() in main + renderer, source maps, unhandledRejection + uncaughtException hooksT4.12 — Crash reporting Sentry: init() en main + renderer, source maps, hooks unhandledRejection + uncaughtException

T4.13 — Feedback Loop scaffold (#15): FeedbackEvent model, IPC channel feedback:record, POST /feedback/explicit + /feedback/implicit endpoints wiredT4.13 — Scaffold Feedback Loop (#15): modelo FeedbackEvent, canal IPC feedback:record, endpoints POST /feedback/explicit + /feedback/implicit conectados

T4.14 — calculateImpactScore: metric delta (listing_views, conversion_rate, revenue_7d) between action_date and +7d snapshot, normalized score 0–100T4.14 — calculateImpactScore: delta métrica (listing_views, conversion_rate, revenue_7d) entre action_date y snapshot +7d, score normalizado 0–100

T4.15 — FeedbackMeasurerService: cron job +7d after each recorded action, fetch snapshot, compute score, persist FeedbackResultT4.15 — FeedbackMeasurerService: cron job +7d después de cada acción registrada, fetch snapshot, computar score, persistir FeedbackResult

T4.15a — FeedbackGate anti-fatigue: max 1 explicit feedback request per session, suppress if user dismissed in last 3 daysT4.15a — FeedbackGate anti-fatiga: máximo 1 solicitud de feedback explícito por sesión, suprimir si usuario descartó en últimos 3 días

T4.15b — Explicit feedback endpoint: POST /feedback/explicit — thumbs up/down + optional note, triggers FeedbackEvent immediatelyT4.15b — Endpoint feedback explícito: POST /feedback/explicit — thumbs up/down + nota opcional, dispara FeedbackEvent inmediatamente

T4.15c — Implicit feedback endpoint: POST /feedback/implicit — captures re-visits to changed listing, time-on-page events, re-run of same toolT4.15c — Endpoint feedback implícito: POST /feedback/implicit — captura re-visitas a listing modificado, eventos time-on-page, re-ejecución del mismo tool

T4.15d — Grace period billing UI: banner when subscription expired but within 7d grace — "Your plan expired, actions paused. Renew to continue."T4.15d — UI grace period billing: banner cuando suscripción expiró pero dentro de 7d grace — "Tu plan expiró, acciones pausadas. Renueva para continuar."

MockupsMockups

T4.MK1 — Mockup EnrollmentView standalone: full marketplace list × all states Connected/Syncing/Error/Disconnected (0.5d, depends T3.BB)T4.MK1 — Mockup EnrollmentView standalone: lista completa de marketplaces × todos los estados Connected/Syncing/Error/Disconnected (0.5d, depende T3.BB)

T4.MK2 — Mockup full WRITE flow: ChatView with AgentStatusBar in ToolUse → ToolAccordion expanded → ReActStream (3 phases) → ConfirmDialogRollbackPanel post-execution (1d, depends T3.BB)T4.MK2 — Mockup flujo WRITE completo: ChatView con AgentStatusBar en ToolUse → ToolAccordion expandido → ReActStream (3 fases) → ConfirmDialogRollbackPanel post-ejecución (1d, depende T3.BB)

Internal Beta BuildBuild Beta Interno

T4.24 — Gate 2 signed build: full .dmg notarized + .exe signed with ALL S7-8 features. Team smoke test. Gate 2 build milestone — candidate for beta distributionT4.24 — Build firmado Gate 2: .dmg notarizado + .exe firmado con TODAS las features S7-8. Smoke test del equipo. Hito build Gate 2 — candidato para distribución beta

Sprint 9-10 — Code Signing + .dmg + Auto-updater + Stripe Live + MockupsSprint 9-10 — Code Signing + .dmg + Auto-updater + Stripe Live + Mockups

T5.7 — Code signing + .dmg + auto-updater: Apple Developer certificate, notarization, electron-updater pointing to releases.shopilot.ai, silent update flowT5.7 — Code signing + .dmg + auto-updater: certificado Apple Developer, notarización, electron-updater apuntando a releases.shopilot.ai, flujo de update silencioso

T5.8 — Electron security hardening: CSP headers, sandbox: true, nodeIntegration: false, contextIsolation: true, allowRunningInsecureContent: falseT5.8 — Hardening seguridad Electron: headers CSP, sandbox: true, nodeIntegration: false, contextIsolation: true, allowRunningInsecureContent: false

T5.9 — Beta bug fixes + RAM profiling +0.5d post-audit T4.BB alignment. Total: 3.5d. Fix P1/P2 bugs from beta cohort, Chrome DevTools memory snapshots, lazy-load views, target RAM <500MBT5.9 — Bug fixes beta + profiling RAM +0.5d para alineación post-auditoría T4.BB. Total: 3.5d. Fix bugs P1/P2 de cohorte beta, snapshots memoria Chrome DevTools, lazy-load vistas, target RAM <500MB

T5.10 — Billing Stripe live: switch from test keys to live keys, verify webhooks prod, smoke test checkout + portal + cancellation flowsT5.10 — Billing Stripe live: cambiar de test keys a live keys, verificar webhooks prod, smoke test checkout + portal + flujos de cancelación

MockupsMockups

T5.MK1 — Mockup Dashboard view: MarketplaceKPI grid + FraudAlert + AuditLog of recent actions + quick access to chat (1d, depends T4.BB)T5.MK1 — Mockup Dashboard view: grid MarketplaceKPIs + FraudAlert + AuditLog de últimas acciones + acceso rápido al chat (1d, depende T4.BB)

Sprint 11-12 — Buffer: Beta Bug Fixes + Auto-updater + WindowsSprint 11-12 — Buffer: Bug Fixes Beta + Auto-updater + Windows

Goal: Clear the P1/P2 backlog from beta. Ship auto-updater pipeline. Windows build if deferred. Feedback UI with impact visualization.Objetivo: Limpiar backlog P1/P2 de beta. Lanzar pipeline de auto-updater. Build Windows si fue diferido. Feedback UI con visualización de impacto.

Bug fixes UI/UX reported by beta users (P1/P2 priority)Bug fixes UI/UX reportados por beta users (prioridad P1/P2)

Auto-updater S3 pipeline: push .dmg → S3 bucket → app detects update → downloads + installs silentlyPipeline auto-updater S3: push .dmg → bucket S3 → app detecta update → descarga + instala silenciosamente

Windows build (if deferred): electron-builder Windows exe, code signing, E2E testsBuild Windows (si fue diferido): electron-builder exe, code signing Windows, tests E2E

Feedback UI: visualize impact of past actions based on FeedbackSummary (ImpactScore per action)Feedback UI: visualizar impacto de acciones pasadas basado en FeedbackSummary (ImpactScore por acción)

RAM optimization if >500MB: profiling with Chrome DevTools, lazy loading views, cleanup WebContentsViewOptimización RAM si >500MB: profiling Chrome DevTools, lazy loading vistas, cleanup WebContentsView

FeedbackThrottle anti-fatigue refinement: tune suppression window from 3d to optimal based on beta engagement data, add per-action-type capsRefinamiento anti-fatiga FeedbackThrottle: ajustar ventana de supresión de 3d al óptimo según datos de engagement beta, agregar caps por tipo de acción

Circuit breaker output: Windows build + Feedback UI were candidates for cut in S7-10.Output del circuit breaker: Build Windows + Feedback UI fueron candidatos a corte en S7-10.

Risk: Single Point of FailureRiesgo: Punto Unico de Falla

Sergio is the only Electron/React engineer. Mitigation: Pablo cross-trains on basics by S4. If Sergio is blocked, Pablo covers UI fixes.Sergio es el unico ingeniero Electron/React. Mitigacion: Pablo hace cross-training en basicos para S4. Si Sergio se bloquea, Pablo cubre fixes de UI.

9.6.4 — Pablo Estrada — CEO / Product Engineer

Pablo Estrada — CEO / Product Engineer

Product + QA + Eval + UX/UI Approval + Launch + GTMProducto + QA + Eval + Aprobación UX/UI + Lanzamiento + GTM

Projects #16, #17, #18, #19Proyectos #16, #17, #18, #19

Pablo wears three hats: CEO (strategy, beta users, launch, GTM), Product Engineer (system prompt, QA with real data), and Project Manager (sprint gates, decisions, team coordination). He also owns Project #17 CORE (Beautonomous) — the operational agent that makes the 4-person team operate as 10-15 engineers. First task: bootstrap CORE before any product code is written, and serves as the approval gate for UX/UI’s Design System (#18) deliverables — reviewing and signing off Figma components every 2 sprints. Pablo usa tres sombreros: CEO (estrategia, beta users, lanzamiento, GTM), Product Engineer (system prompt, QA con datos reales), y Project Manager (gates de sprint, decisiones, coordinacion de equipo). Tambien es dueno del Proyecto #17 CORE (Beautonomous) — el agente operacional que hace que el equipo de 4 opere como 10-15 ingenieros. Primera tarea: bootstrap CORE antes de escribir codigo de producto, y sirve como puerta de aprobación para los entregables del Design System (#18) de UX/UI — revisando y aprobando componentes Figma cada 2 sprints.

Sprint 0 (Pre-Sprint) — Project #17 CORE BootstrapSprint 0 (Pre-Sprint) — Bootstrap Proyecto #17 CORE

Goal: Beautonomous operational agent running in OpenClaw — all 4 engineers using it for task management, code review, and workflow orchestration before writing product code.Objetivo: Agente operacional Beautonomous corriendo en OpenClaw — los 4 ingenieros usandolo para manejo de tareas, code review, y orquestacion de workflows antes de escribir codigo de producto.

Create OpenClaw project + authorize GitHub, Linear, Slack connectorsCrear proyecto OpenClaw + autorizar conectores GitHub, Linear, Slack

Write system prompt: role mapping (El Capitan, El Mago, El Artesano), governance rules, repos, Slack channels, risk taxonomyEscribir system prompt: mapeo de roles (El Capitan, El Mago, El Artesano), reglas de gobernanza, repos, canales Slack, taxonomía de riesgo

Configure 3 roles: Pablo=El Capitan, Mateo=El Mago, Andres+Sergio=El ArtesanoConfigurar 3 roles: Pablo=El Capitan, Mateo=El Mago, Andres+Sergio=El Artesano

Validation: each team member runs 3 test queries successfullyValidacion: cada miembro del equipo ejecuta 3 queries de prueba exitosamente

Code Signing CertificatesCertificados Code Signing

T0.9 — Apple Developer Program enrollment ($99/yr) + Developer ID Application certificate for .dmg code signing + notarizationT0.9 — Inscripción Apple Developer Program ($99/año) + certificado Developer ID Application para code signing + notarización .dmg

T0.10 — Windows code signing certificate procurement (EV/OV) for SmartScreen trust on .exe buildsT0.10 — Adquisición certificado code signing Windows (EV/OV) para confianza SmartScreen en builds .exe

Brand BookBrand Book

T0.11 — Brand Book delivery from external design team. Request following guidelines in core-product-design-system repo. Deliverable: complete visual identity (logo, colors, typography, usage rules). Required before T0.BB (Figma foundations delivery)T0.11 — Entrega Brand Book del equipo externo de diseño. Solicitar siguiendo lineamientos en repo core-product-design-system. Entregable: identidad visual completa (logo, colores, tipografía, reglas de uso). Requerido antes de T0.BB (entrega foundations Figma)

Dependencies: None — this is the FIRST thing that happens, before any product code.Dependencias: Ninguna — esto es lo PRIMERO que pasa, antes de cualquier codigo de producto.

CORE Governance: All subsequent projects must pass through Beautonomous for task tracking, PR review, and workflow execution.Gobernanza CORE: Todos los proyectos subsiguientes deben pasar por Beautonomous para tracking de tareas, review de PRs, y ejecucion de workflows.

Sprint 1-2 — Eval Scaffold + Brand RegistrationSprint 1-2 — Scaffold Eval + Registro de Marca

Goal: Eval Suite scaffold with initial golden dataset, brand registrations started, Apple/Windows store authorization.Objetivo: Scaffold Eval Suite con golden dataset inicial, registros de marca iniciados, autorización Apple/Windows store.

T1.24 — Eval Fase 0 Setup + Golden Dataset: interfaces (IEvalPipeline, ILLMJudge, IGoldenDatasetManager), domain models, golden dataset 15-20 cases YAMLT1.24 — Eval Fase 0 Setup + Golden Dataset: interfaces (IEvalPipeline, ILLMJudge, IGoldenDatasetManager), modelos de dominio, golden dataset 15-20 casos YAML

T1.26 — Brand registration in marketplaces: Amazon Brand Registry, Amazon Ads, MercadoLibre, Shopify. Weekly tracking of approval status. Coordinate with Andrés for API account alignmentT1.26 — Registro de marca ante marketplaces: Amazon Brand Registry, Amazon Ads, MercadoLibre, Shopify. Seguimiento semanal del estado de aprobación. Coordinar con Andrés para alineación de cuentas API

T1.27 — Authorize app in Apple & Windows stores: Apple Developer Program enrollment ($99/yr) + code signing certificate. Microsoft Partner Center registration. Goal: verified publisher on both platformsT1.27 — Autorizar app en Apple & Windows Store: inscripción Apple Developer Program ($99/año) + certificado code signing. Registro en Microsoft Partner Center. Objetivo: publisher verificado en ambas plataformas

#18 Design System (UX/UI approves)#18 Design System (UX/UI aprueba)

T0.BB — Approve Brand book + Foundations Figma delivery (end of week 1): HEX/RGB/HSL palette, typography .woff2, logo SVG, [LIB] Foundations & Tokens, [LIB] Iconography, [LIB] Core Components partial (Button, Icon, StatusDot, Spinner, Divider, TabBar)T0.BB — Aprobar entrega Brand book + Foundations Figma (fin de semana 1): paleta HEX/RGB/HSL, tipografía .woff2, logo SVG, [LIB] Foundations & Tokens, [LIB] Iconography, [LIB] Core Components parcial (Button, Icon, StatusDot, Spinner, Divider, TabBar)

T1.BB — Approve Atoms + AI-native Atoms + Molecules base + Chat Organisms delivery (end of week 2): Input, Badge, Toggle, Tooltip, AvatarInitials, CreditBadge, ProgressBar, KbdShortcut + StreamingCursor, ThinkingPulse, ToolBadge, AgentStatusBar, RiskBadge, TTLCountdown + MessageBubble, ChatInputBar, ContextBar, OnboardingStepT1.BB — Aprobar entrega Atoms + AI-native Atoms + Molecules base + Organisms de chat (fin de semana 2): Input, Badge, Toggle, Tooltip, AvatarInitials, CreditBadge, ProgressBar, KbdShortcut + StreamingCursor, ThinkingPulse, ToolBadge, AgentStatusBar, RiskBadge, TTLCountdown + MessageBubble, ChatInputBar, ContextBar, OnboardingStep

Sprint 3-4 — LLM Judge + Linear Bootstrap + E2ESprint 3-4 — LLM Judge + Bootstrap Linear + E2E

T2.24 — Eval Fase 1 LLM Judge + EvalRunner: AnthropicLLMJudge (Haiku standard, Sonnet critical), YamlDatasetLoader, CLI eval.ts + check-threshold.ts, 20 golden cases minimumT2.24 — Eval Fase 1 LLM Judge + EvalRunner: AnthropicLLMJudge (Haiku estándar, Sonnet crítico), YamlDatasetLoader, CLI eval.ts + check-threshold.ts, 20 golden cases mínimo

T2.25 — E2E testing via Playground: full flows with real Sellerfy data, document QA → Linear via BeautonomousT2.25 — Testing E2E via Playground: flujos completos con datos reales Sellerfy, documentar QA → Linear via Beautonomous

T2.26 — Bootstrap ~150 tasks in Linear: 6 cycles, L/M/S labels, critical path dependenciesT2.26 — Bootstrap ~150 tareas en Linear: 6 ciclos, labels L/M/S, dependencias ruta crítica

T2.26a — Quality gate 5-step Beautonomous: structure → lint → tests → architecture review → convention checkT2.26a — Quality gate 5 pasos Beautonomous: structure → lint → tests → architecture review → convention check

#18 Design System (UX/UI approves)#18 Design System (UX/UI aprueba)

T2.BB — Approve remaining Molecules + Data & Flow Organisms (end of S4): Select, Dropdown, Toggle labeled, Tooltip rich, ConfirmDialog REVERSIBLE/IRREVERSIBLE, ToolAccordion, ProactiveCard, MarketplaceKPI, CreditEconomy, EnrollmentCard. Publish [LIB] Core Components completeT2.BB — Aprobar Molecules restantes + Organisms de datos y flujos (fin de S4): Select, Dropdown, Toggle labeled, Tooltip rich, ConfirmDialog REVERSIBLE/IRREVERSIBLE, ToolAccordion, ProactiveCard, MarketplaceKPI, CreditEconomy, EnrollmentCard. Publicar [LIB] Core Components completo

Sprint 5-6 — Eval CI + Golden Dataset 50 + QASprint 5-6 — Eval CI + Golden Dataset 50 + QA

T3.26 — Eval Fase 2 CI integration: eval-on-pr.yml in GitHub Actions, Coach staging → LLM Judge → EvalReport, PR blocked if !passedT3.26 — Eval Fase 2 CI integration: eval-on-pr.yml en GitHub Actions, Coach staging → LLM Judge → EvalReport, PR bloqueado si !passed

T3.27 — Golden dataset 50 cases: 15 product, 10 pricing, 8 WRITE, 7 proactive, 10 edge casesT3.27 — Golden dataset 50 casos: 15 producto, 10 pricing, 8 WRITE, 7 proactivo, 10 edge cases

T3.28 — QA conversation flows 3 marketplaces: test all flows with Sellerfy data, document issues → LinearT3.28 — QA flujos conversación 3 marketplaces: probar todos los flujos con datos Sellerfy, documentar issues → Linear

Eval Extension — Figma Quality Pipeline (7.5d)Extensión Eval — Pipeline Figma Quality (7.5d)

T3.40 — Extend EvalConfig + CLI: add desktop_build and figma_quality as pipelineType. Models: DesktopBuildReport, FigmaQualityReport. CLI flags --pipeline=desktop_build / --pipeline=figma_quality (1d)T3.40 — Extender EvalConfig + CLI: agregar desktop_build y figma_quality como pipelineType. Modelos: DesktopBuildReport, FigmaQualityReport. Flags CLI --pipeline=desktop_build / --pipeline=figma_quality (1d)

T3.41 — FigmaRESTClient: IFigmaAPIClient with getFile, getFileVariables, getFileComponents, getFileStyles. Auth via FIGMA_ACCESS_TOKEN (1.5d)T3.41 — FigmaRESTClient: IFigmaAPIClient con getFile, getFileVariables, getFileComponents, getFileStyles. Auth via FIGMA_ACCESS_TOKEN (1.5d)

T3.42 — FigmaQualityRunner + variable checks: variable_architecture (3 collections), code_syntax (Code Syntax Web), semantic_aliasing (Semantic aliases Primitives), light_dark_modes (2 modes in Semantic) (2d)T3.42 — FigmaQualityRunner + checks de variables: variable_architecture (3 colecciones), code_syntax (Code Syntax Web), semantic_aliasing (Semantic aliasea Primitives), light_dark_modes (2 modos en Semantic) (2d)

T3.43 — Component checks: auto_layout, naming_convention (slash naming), states_coverage (min states per type), color_hardcoding (no direct hex), spacing_hardcoding (no numeric values) (2d)T3.43 — Checks de componentes: auto_layout, naming_convention (slash naming), states_coverage (states mínimos por tipo), color_hardcoding (sin hex directo), spacing_hardcoding (sin valores numéricos) (2d)

T3.44 — Quality checks + report: wcag_contrast (4.5:1 text, 3:1 UI), descriptions (published components), mcp_compatibility (semantic layer names). Generate compliance report per file (1d)T3.44 — Checks de calidad + reporte: wcag_contrast (4.5:1 texto, 3:1 UI), descriptions (componentes publicados), mcp_compatibility (nombres semánticos). Generar reporte compliance por archivo (1d)

#18 Design System (UX/UI approves)#18 Design System (UX/UI aprueba)

T3.BB — Approve advanced Organisms + close Pattern Components (end of S6): ReActStream, DataTable, AuditLog, RollbackPanel, FraudAlert, ErrorRecovery A/B/C. Publish [LIB] Pattern Components completeT3.BB — Aprobar Organisms avanzados + cierre Pattern Components (fin de S6): ReActStream, DataTable, AuditLog, RollbackPanel, FraudAlert, ErrorRecovery A/B/C. Publicar [LIB] Pattern Components completo

Sprint 7-8 — Eval Automated + Beta Prep + Contract TestingSprint 7-8 — Eval Automatizado + Prep Beta + Contract Testing

T4.17 — Automated Eval in CI: 50 golden cases on every push to main, fails if score <0.70T4.17 — Eval automatizado en CI: 50 golden cases en cada push a main, falla si score <0.70

T4.18 — Proactive suggestions testing with real data: verify triggers, message quality, dedup, max 2/turnT4.18 — Testing proactivas datos reales: verificar triggers, calidad mensaje, dedup, máximo 2/turno

T4.19 — Beta user selection + onboarding prep: 10-15 Sellerfy sellers, 2-min video, setup doc, 1-on-1 callsT4.19 — Selección beta users + prep onboarding: 10-15 vendedores Sellerfy, video 2 min, doc setup, calls 1-on-1

T4.19a — Eval contract testing pipeline: consumer-driven contracts between reposT4.19a — Pipeline contract testing eval: contratos consumer-driven entre repos

T4.19b — KB quality eval pipeline: precision@5, recall, hit rate, CI fails if <80%T4.19b — Pipeline eval calidad KB: precision@5, recall, hit rate, CI falla si <80%

Eval Extension — Desktop Build Pipeline (7d)Extensión Eval — Pipeline Desktop Build (7d)

T4.25 — Code signing secrets: configure macOS certificates (Developer ID + Apple notarization) and Windows (Authenticode) in GitHub Secrets. Verify electron-builder recognizes them (1d)T4.25 — Secrets code signing: configurar certificados macOS (Developer ID + notarización Apple) y Windows (Authenticode) en GitHub Secrets. Verificar que electron-builder los reconoce (1d)

T4.26 — DesktopBuildRunner + core checks: compilation (build + artifact exists), code signing (codesign/signtool verify), notarization (spctl, macOS only), app startup (headless <5s), bundle size (<250MB), native modules (require without error) (3d)T4.26 — DesktopBuildRunner + checks core: compilation (build + artefacto existe), code signing (codesign/signtool verify), notarization (spctl, solo macOS), arranque app (headless <5s), bundle size (<250MB), módulos nativos (require sin error) (3d)

T4.27 — Secondary checks: auto-updater (feed URL resolves), deep links (shopilot:// in Info.plist/Windows registry), window rendering (console.error), IPC channels (ping/pong). Warnings, not blockers (1d)T4.27 — Checks secundarios: auto-updater (URL feed resuelve), deep links (shopilot:// en Info.plist/registro Windows), window rendering (console.error), canales IPC (ping/pong). Warnings, no blockers (1d)

T4.28 — GitHub Actions desktop-build-eval.yml: 3 jobs — build-macos (macos-14 runner), build-windows (windows-latest), report (aggregate + PR comment). Trigger: PRs touching desktop-client (1.5d)T4.28 — GitHub Actions desktop-build-eval.yml: 3 jobs — build-macos (runner macos-14), build-windows (runner windows-latest), report (agregar + comentario PR). Trigger: PRs que tocan desktop-client (1.5d)

T4.29 — GitHub Actions figma-quality-eval.yml: triggers workflow_dispatch + weekly cron (Monday 8:00 UTC). Publishes report as GitHub issue or Slack #engineering message (0.5d)T4.29 — GitHub Actions figma-quality-eval.yml: triggers workflow_dispatch + cron semanal (lunes 8:00 UTC). Publica reporte como GitHub issue o mensaje Slack #engineering (0.5d)

#18 Design System (UX/UI approves)#18 Design System (UX/UI aprueba)

T4.BB — Approve Figma quality audit + corrections (end of S8): all frames “Ready for development”, zero generic names, variables verified DevMode, all interactive states, changelog updated, Figma annotationsT4.BB — Aprobar auditoría calidad Figma + correcciones (fin de S8): todos los frames “Ready for development”, cero nombres genéricos, variables verificadas DevMode, todos los states interactivos, changelog actualizado, annotations Figma

Sprint 9-10 — Beta Onboarding + Feedback + Security + Go/No-GoSprint 9-10 — Onboarding Beta + Feedback + Seguridad + Go/No-Go

T5.11 — Beta onboarding 10-15 sellers: .dmg → connect marketplace → first query → first action. 30-min 1-on-1 callsT5.11 — Onboarding beta 10-15 vendedores: .dmg → conectar marketplace → primera query → primera acción. Calls 1-on-1 30 min

T5.12 — Feedback calls + iteration: 15-min with each beta user, top 5 issues → LinearT5.12 — Feedback calls + iteración: 15 min con cada beta user, top 5 issues → Linear

T5.13 — OWASP top 10 security review: document findings + fix P1sT5.13 — Review seguridad OWASP top 10: documentar hallazgos + arreglar P1s

T5.14 — Beautonomous System Prompt v2: iteration based on 10 weeks real usageT5.14 — System Prompt v2 Beautonomous: iteración basada en 10 semanas uso real

T5.15 — Go/No-Go: 60-min final sync, full checklist, Pablo signs off GoT5.15 — Go/No-Go: sync final 60 min, checklist completo, Pablo firma Go

T5.15a — E2E eval pipeline: full query→tools→response flow, 10+ scenariosT5.15a — Pipeline E2E eval: flujo completo query→tools→response, 10+ escenarios

#18 Design System#18 Design System

No BB task in S9-10 — Figma pipeline closed. UX/UI available only for ad-hoc queries.No hay tarea BB en S9-10 — pipeline Figma cerrado. UX/UI disponible solo para consultas puntuales.

Sprint 11-12 — Buffer: Eval Iteration + DocumentationSprint 11-12 — Buffer: Iteración Eval + Documentación

Goal: Push Eval score from 0.70 → 0.80 using real beta conversation data. Second feedback round. Technical documentation and postmortem.Objetivo: Subir Eval score de 0.70 → 0.80 usando conversaciones reales de beta. Segundo round de feedback. Documentación técnica y postmortem.

Eval iteration: new golden cases derived from observed failures in beta conversationsIteración Eval: nuevos golden cases derivados de fallos observados en conversaciones de beta

Eval score target 0.80: refine rubrics, add edge cases, calibrate LLM JudgeTarget Eval score 0.80: refinar rubrics, agregar edge cases, calibrar LLM Judge

Second beta feedback round: 15-min calls with active users, document usage patterns, most-requested featuresSegundo round feedback beta: calls 15 min con usuarios activos, documentar patrones, features más pedidas

Technical documentation + postmortem: architecture decisions, lessons learned, runbook for v2Documentación técnica + postmortem: decisiones de arquitectura, lecciones aprendidas, runbook para v2

Circuit breaker output: Any eval work cut from S7-10 lands here.Output del circuit breaker: Cualquier trabajo de eval cortado de S7-10 llega aquí.

Key Role: Three HatsRol Clave: Tres Sombreros

CEO (decisions, strategy, beta, GTM) + Product Engineer (prompt, QA) + PM (sprint gates, go/no-go calls, team coordination). Only person with full product+technical+business context.CEO (decisiones, estrategia, beta, GTM) + Product Engineer (prompt, QA) + PM (gates de sprint, calls go/no-go, coordinacion de equipo). Unica persona con contexto completo de producto+tecnico+negocio.

9.7 Sprint Execution — 100% Task Breakdown Ejecucion Sprint — 100% Desglose de Tareas

CTO + PM perspective. Every task from all 19 active projects. Linear-exportable. Project #17 CORE governance referenced per project. Perspectiva CTO + PM. Cada tarea de los 19 proyectos activos. Exportable a Linear. Gobernanza Proyecto #17 CORE referenciada por proyecto.

Phase 0 — Pre-Sprint: #17 CORE BootstrapFase 0 — Pre-Sprint: Bootstrap #17 CORE

Week 0 • 11 tasks • Pablo (lead) + Mateo (support)Semana 0 • 11 tareas • Pablo (líder) + Mateo (soporte)

Beautonomous is the operational agent that makes 4 engineers operate as 10-15. Provides task management (Linear), code review (GitHub), and governance — all via OpenClaw. No product code until CORE is operational. Source: core-internal-team-workflow/.claude/specs/development-plan.md Phases 0–2.Beautonomous es el agente operacional que hace que 4 ingenieros operen como 10–15. Provee gestión de tareas (Linear), code review (GitHub) y gobernanza — todo vía OpenClaw. Sin código de producto hasta que CORE esté operacional. Fuente: core-internal-team-workflow/.claude/specs/development-plan.md Fases 0–2.

IDTaskTareaOwnerDueñoProjTimeTiempoDependsDepende
T0.1Crear Proyecto OpenClaw

Proyecto ‘Beautonomous’, tipo agente operacional, 4 miembros invitados

Pablo#1730m
T0.2Conectar OAuth GitHub

Autorizar organización, seleccionar 11 repos core-(capa)-proyecto

Mateo#1730mT0.1
T0.3Conectar OAuth Linear

Workspace ‘beautonomous’, equipo AUT, lectura+escritura

Pablo#1730mT0.1
T0.4Conectar OAuth Slack

Canales #engineering, #deploys, #general. Lectura + envío

Mateo#1730mT0.1
T0.5System Prompt v1 Beautonomous

Identidad, roles (Capitán/Mago/Artesano), 6 reglas gobernanza, repos, canales Slack. ~500 palabras. NO es el prompt del Coach

Pablo#174hT0.1
T0.6Configurar Mapeo de Roles

pablo→Capitán, mateo→Mago, andres/sergio→Artesano. Permisos por rol per spec F2.1–F2.3

Pablo#171hT0.5
T0.7Crear Estructura Linear

17 proyectos, 6 ciclos (2 sem c/u incl buffer), labels L/M/S + Track-{ingeniero} + Risk-level. Workflow: Backlog→Todo→In Progress→In Review→Done

Pablo#172hT0.3
T0.8Validación

4 miembros × 3 queries de prueba (1 lectura GitHub/Linear, 1 creación tarea, 1 lectura código). Verificar permisos de rol

Los 4#171hT0.2–T0.6
T0.9Apple Developer Program enrollmentInscripción Apple Developer Program

Enroll in Apple Developer Program ($99/yr). Request Developer ID Application certificate for code signing + notarization. Required for signed .dmg builds at Gate 1 (S4)Inscribirse en Apple Developer Program ($99/año). Solicitar certificado Developer ID Application para code signing + notarización. Requerido para builds .dmg firmados en Gate 1 (S4)

Pablo#11h
T0.10Windows code signing certificate procurementAdquisición certificado code signing Windows

Procure EV or OV code signing certificate for Windows .exe builds. Required for SmartScreen trust at Gate 1 (S4). Vendor options: DigiCert, Sectigo, GlobalSignAdquirir certificado code signing EV u OV para builds .exe Windows. Requerido para confianza SmartScreen en Gate 1 (S4). Opciones: DigiCert, Sectigo, GlobalSign

Pablo#11h
T0.11Brand Book delivery from external design teamEntrega Brand Book del equipo externo de diseño

Request brand book from external design team following the guidelines documented in core-product-design-system repo. Deliverable: complete visual identity (logo, colors, typography, usage rules). Required before T0.BB (Figma foundations delivery) can beginSolicitar brand book al equipo externo de diseño siguiendo los lineamientos documentados en el repo core-product-design-system. Entregable: identidad visual completa (logo, colores, tipografía, reglas de uso). Requerido antes de que T0.BB (entrega foundations Figma) pueda comenzar

Pablo#181d

Checkpoint: Beautonomous operational. From here everything is tracked in Linear via Beautonomous.Checkpoint: Beautonomous operacional. Desde aquí todo se trackea en Linear vía Beautonomous.

Sprints 1-2 — Walking SkeletonSprints 1-2 — Walking Skeleton

Weeks 1-2 • 37 tasksSemanas 1-2 • 37 tareas
Gate:ReAct loop processes a query. Electron loads marketplace. 3 OAuth flows working. KB indexed.Loop ReAct procesa una query. Electron carga marketplace. 3 OAuth flows funcionan. KB indexada.
IDTaskTareaProjTimeTiempoDependsDepende
Mateo#2 Orchestrator · #4 Personality · #5 Context · #8 Observability · #9 Cerebro KB — 12 tareas
T1.1Corrección DynamoDB (Fase -1)

IDs UUID→ULID, Trace SK a Trace#{messageId}, eliminar queryEmbedding/answer de Trace, GSI1 ALL→INCLUDE, GSI2 sparse, dead code. CDK Stack + KeyBuilders + repos + tests

#23dT0.8
T1.2UserProfile entity

pk: User#{userId}, sk: Profile. Campos: userId, marketplaces, productCategories, declaredGoals, lastUpdatedAt. IUserProfileRepository + DynamoUserProfileRepository

#21dT1.1
T1.3Historial en el prompt (Fase 0.1)

Últimos N mensajes en prompt. Método findWindowForPrompt(convId, windowSize). Token budget: 200K − system − rag − respuesta

#22dT1.1
T1.4ILLMClient update

chat() acepta toolDefinitions?, thinkingBudget?. Retorna { content: ContentBlock[], stopReason }. Actualizar LLMClientFactory + todos los clients (OpenRouter, Anthropic, Vertex)

#22d
T1.5SystemPromptComposer L1+L2

L1 identidad base (~500 tok, cache_control: ephemeral). L2 sesión (UserProfile + alertas críticas). compose(context) → ComposedSystemPrompt { blocks[], estimatedTokens }

#42dT1.2
T1.6AgentLoopOrchestrator (Fase 0.3)

Loop ReAct Reason→Act→Observe, MAX_ROUNDS=10, cost guard 50K tokens. IContextWindowManager: presupuesto = 200K − system − history − tools − 4000

#23dT1.3, T1.4, T1.5
T1.7RestResponseEventEmitter

Modo REST (Phase 0.3, sin streaming). Respuesta completa después de todas las rondas. Eventos internos para logging

#24hT1.6
T1.8Verificar Observability con ReAct

ConversationTrace + AgentTracking existentes compatibles con loop multi-step. Agregar tool calls, round count, cost por turno a trazas

#81dT1.6
T1.21KB Fase 0 — Fix duplicados

TRUNCATE antes de embed. embedded_at timestamp real. CI Go version 1.21→1.24. Separar created_at/updated_at/embedded_at

#92d
T1.22KB Fase 1 — Contextual Retrieval

Prefijo contextual [namespace/type] title antes de embeder. Chunking por secciones Markdown (##/###). Overlap 150 chars

#92dT1.21
T1.23Contenido KB: 15-20 docs curados

Mejores prácticas MeLi, políticas Amazon, guías Shopify, estrategias pricing, optimización fotos, métricas, FAQ vendedores

#95d
T1.2510 READ tool specs

Para cada tool: name, description LLM, input_schema JSON Schema, risk level, credit cost. 10 tools: get_product, get_product_metrics, get_orders, get_buyer_questions, get_product_reviews, get_category_requirements, get_campaigns, get_campaign_metrics, get_store, get_store_metrics

#92d
Andrés#12 Marketplace Provider · #10 Data Sync · #14 DevOps — 13 tareas
T1.9Scaffold Marketplace Provider

Clean Architecture + DDD. Value Objects (Marketplace, SKU, MarketplaceCredential). Error types (MarketplaceAPIError, AuthenticationError, RateLimitError). DI container

#121dT0.8
T1.10IMarketplaceAdapter interface

23 métodos, 4 dominios (Catalog, Engagement, Advertising, Enrollment). ISKUResolver: SKU → marketplace ID nativo

#124hT1.9
T1.11AES256GCMCipher + ITokenManager

Cifrado tokens at rest. DynamoDB table marketplace-credentials. ITokenManager con refresh automático pre-expiración (buffer 15min)

#122dT1.9
T1.12MeLiOAuth2Flow + MeLiAdapter

OAuth2 code flow. REST API MeLi (/users/me/items, /orders/search). Mapeo errores MeLi → errores estandarizados. Código reutilizado de context/marketplace-connection/

#123dT1.10, T1.11
T1.13AmazonLWAFlow + AmazonAdapter scaffold

OAuth2 LWA. SP-API SDK. Rate limiting por familia de API. Solo scaffold — full impl en S3-4 (depende E1 approval 2-4 sem)

#122dT1.10, T1.11
T1.14Verificar Terraform GCP existente

Confirmar GCS buckets, Cloud Run, Airflow, BigQuery operacionales. Fix si necesario

#141dT0.8
T1.15Solicitar dependencias externas (E1-E5)

Amazon SP-API dev account (día 1), MeLi dev portal, Shopify Partners, Apple Developer Program. Documentar en Linear

#144hT0.8
T1.15aSellerConnection aggregate

State machine 5 estados (disconnected→pending→active→expired→revoked). Transiciones validadas. Persiste en DynamoDB

#121dT1.9
T1.15bMarketplaceAction entity + IMarketplaceActionRepository

Registro de cada acción. Campos: actionId, sellerId, marketplace, method, status, requestPayload, responsePayload, latencyMs

#124hT1.9
T1.15cIOAuth2Flow interface (domain port)

Puerto genérico para flujos OAuth2 (authorize, exchangeCode, refreshToken). MeLi/Amazon/Shopify implementan

#124hT1.9
T1.28Collect missing WRITE API docsRecolectar docs APIs WRITE faltantes

Map WRITE actions for #3 Tool Registry. Existing docs collected; complete remaining: MeLi: 3, Amazon Ads: 5, Amazon: 2, Shopify: 9. Organize per marketplace in shared repoMapear acciones WRITE para #3 Tool Registry. Docs existentes recolectados; completar faltantes: MeLi: 3, Amazon Ads: 5, Amazon: 2, Shopify: 9. Organizar por marketplace en repo compartido

#123dT0.8
T1.29Collect user management provider docsRecolectar docs gestor de usuarios externo

Research external auth provider (authentication, authorization, credential management). Document service methods exposed to consumer layers. Evaluate options (Auth0, Clerk, Memberstack)Investigar proveedor externo de auth (autenticación, autorización, administración de credenciales). Documentar métodos de servicio expuestos a capas consumidoras. Evaluar opciones (Auth0, Clerk, Memberstack)

#122dT0.8
T1.33GitHub Actions CI: electron-builder on release/* branchGitHub Actions CI: electron-builder en rama release/*

GitHub Actions workflow: trigger on release/* branch push. Run electron-builder --mac --win. Upload .dmg + .exe as artifacts. Notify #deploys Slack channelWorkflow GitHub Actions: trigger en push a rama release/*. Ejecutar electron-builder --mac --win. Subir .dmg + .exe como artifacts. Notificar canal Slack #deploys

#140.5dT1.32
Sergio#1 Native Shell — 7 tareas
T1.16Scaffold Electron + electron-builder

Electron 28+. Entry point main process. Preload scripts con contextBridge. Hot reload dev. Scripts: dev/build/pack

#11dT0.8
T1.17MainWindow + WebContentsView

WebContentsView (NO BrowserView — deprecado E26). 70% ancho ventana. Controles navegación. Persistencia sesión marketplace

#12dT1.16
T1.18MarketplaceDetector

Patterns URL MeLi/Amazon/Shopify. Detectar tipo página (product/dashboard/orders), extraer IDs. Remote config JSON con fallback local

#11dT1.17
T1.19Sistema de Tabs + Sidebar container

Tabs marketplace + sidebar React 360px derecha. Componentes UI de shopilot-design-system (#18 Design System). IPC main↔renderer. Toggle Cmd+B. +0.5d setup tokens T0.BB

#12.5dT1.17, T0.BB
T1.20Auth Memberstack

JWT en electron-store cifrado con clave del OS. Login/logout flow. AuthService en main process

#11dT1.16
T1.32First .dmg + .exe canary build (unsigned)Primer build canary .dmg + .exe (sin firmar)

Run electron-builder for macOS (.dmg) and Windows (.exe) — unsigned. Verify packaging (icons, metadata, entitlements). Team install test on macOS + Windows VM. Surface packaging issues earlyEjecutar electron-builder para macOS (.dmg) y Windows (.exe) — sin firmar. Verificar empaquetado (iconos, metadata, entitlements). Test de instalación en equipo en macOS + VM Windows. Detectar problemas de empaquetado temprano

#11dT1.16
T1.MK1Mockup shell containerMockup shell container

Assemble sidebar + tabs with T0.BB tokens in React. Validate visual integration of Figma foundations in real Electron contextEnsamblar sidebar + tabs con tokens de T0.BB en React. Validar integración visual de foundations Figma en contexto Electron real

#10.5dT0.BB
Pablo#16 Eval Suite · #17 Beautonomous · #10 Data Sync · #1 Native Shell — 3 tareas
T1.24Eval Fase 0 — Setup + Golden Dataset

package.json (sin servidor). Interfaces dominio (IEvalPipeline, ILLMJudge, IGoldenDatasetManager). Golden dataset 15-20 casos YAML (fees, scope, metrics)

#163d
T1.26Brand registration in marketplacesRegistro de marca ante marketplaces

Start brand registration process in Amazon Brand Registry, Amazon Ads, MercadoLibre, and Shopify. Track approval status weekly. Coordinate with Andrés for API developer account alignmentIniciar proceso de registro de marca en Amazon Brand Registry, Amazon Ads, MercadoLibre y Shopify. Dar seguimiento semanal al estado de aprobación. Coordinar con Andrés para alinear con cuentas developer de API

#103dT0.8
T1.27Authorize app in Apple & Windows storesAutorizar app en Apple & Windows Store

Apple Developer Program enrollment ($99/yr) + code signing certificate request. Microsoft Partner Center registration for Windows Store publishing. Goal: app recognized as verified publisher on both platformsInscripción en Apple Developer Program ($99/año) + solicitud de certificado code signing. Registro en Microsoft Partner Center para publicación en Windows Store. Objetivo: app reconocida como publisher verificado en ambas plataformas

#12dT0.8
UX/UI + Pablo#18 Design System — 2 tareas
T0.BBBrand book + Foundations Figma (week 1 delivery)Brand book + Foundations Figma (entrega semana 1)

Brand book (D1–D9 resolved, HEX/RGB/HSL palette, typography .woff2, logo SVG). [LIB] Foundations & Tokens (00 Primitives, 01 Semantic Light/Dark, Code Syntax, Text Styles). [LIB] Iconography (Lucide 40+ icons). [LIB] Core Components partial (Button, Icon, StatusDot, Spinner, Divider, TabBar). Owner: UX/UI executes + Pablo approves end week 1Brand book (D1–D9 resueltas, paleta HEX/RGB/HSL, tipografía .woff2, logo SVG). [LIB] Foundations & Tokens (00 Primitives, 01 Semantic Light/Dark, Code Syntax, Text Styles). [LIB] Iconography (Lucide 40+ íconos). [LIB] Core Components parcial (Button, Icon, StatusDot, Spinner, Divider, TabBar). Owner: UX/UI ejecuta + Pablo aprueba fin semana 1

#184dT0.11
T1.BBAtoms + AI-native Atoms + Molecules base + Chat Organisms (week 2 delivery)Atoms + Atoms AI-nativos + Molecules base + Organismos de chat (entrega semana 2)

Atoms: Input, Badge, Toggle, Tooltip, AvatarInitials, CreditBadge, ProgressBar, KbdShortcut. AI-native: StreamingCursor, ThinkingPulse, ToolBadge, AgentStatusBar, RiskBadge, TTLCountdown. Molecules: InputField, SearchBar, CreditDisplay. Organisms: MessageBubble, ChatInputBar, ContextBar, OnboardingStep. Owner: UX/UI executes + Pablo approves end week 2Atoms: Input, Badge, Toggle, Tooltip, AvatarInitials, CreditBadge, ProgressBar, KbdShortcut. AI-nativos: StreamingCursor, ThinkingPulse, ToolBadge, AgentStatusBar, RiskBadge, TTLCountdown. Molecules: InputField, SearchBar, CreditDisplay. Organismos: MessageBubble, ChatInputBar, ContextBar, OnboardingStep. Owner: UX/UI ejecuta + Pablo aprueba fin semana 2

#186dT0.BB

Sprints 3-4 — Core EnginesSprints 3-4 — Motores Core

Weeks 3-4 • 38 tasksSemanas 3-4 • 38 tareas
Gate:ToolRegistry + 10 READ handlers with tool stubs. Shell with basic chat. KB in BigQuery. Eval runner executes 15 golden cases.ToolRegistry + 10 READ handlers con tool stubs. Shell con chat básico. KB en BigQuery. Eval runner ejecuta 15 golden cases.
IDTaskTareaProjTimeTiempoDependsDepende
Mateo#3 Tool Registry · #5 Context Agg · #2 Orchestrator · #11 Enrichment · #9 Cerebro KB — 15 tareas
T2.1ToolRegistry + ToolDefinition

register(def, handler) / registerRemote() / getDefinitions() / getHandler(name). Schema: name, description, inputSchema (Zod), category (READ/WRITE/ANALYSIS/SYSTEM), riskLevel, marketplace, estimatedTokens, cacheable

#32dT1.6, T1.25
T2.2IToolExecutor + ToolExecutor

Interfaz: execute(toolName, args, context) → ToolResult + getToolDefinitions(). Implementación orquesta lifecycle

#31dT2.1
T2.3ToolPolicyFilter

Risk gate (irreversible → confirmación obligatoria) + marketplace gate (tool no disponible si MP no configurado). Extensible sin tocar executor

#31dT2.1
T2.4HookLifecycle

before_tool → execute → after_tool. Toda infra transversal aquí. after_tool se ejecuta incluso si execute falla

#31dT2.2
T2.510 READ tool handlers (stubs)

Handlers para las 10 READ tools. Stubs HTTP con datos mock. Estructura: handlers/read/

#32dT2.1
T2.5aToolResult domain model

toolName, args, result, isError, latencyMs, cached, creditCost. Valor inmutable usado por HookLifecycle, caching y trazas

#34hT2.1
T2.5bupdate_user_profile SYSTEM tool handler

Actualiza UserProfile (marketplaces, categories, goals). El LLM lo invoca cuando detecta info nueva del vendedor en la conversación

#34hT2.1, T1.2
T2.5ccontextSummary

Resumen automático de conversación cuando historial supera threshold de tokens. Campos opcionales en Conversation: contextSummary?, contextSummarizedUpToMessageId?

#51dT1.3
T2.5d17 WRITE tool stubs

Registrar las 17 WRITE tools en ToolRegistry con ConfirmationRequired policy. Sin handler real — retornan NotImplemented. Permite al LLM “verlas” y planificar

#34hT2.1
T2.6IContextAssembler

Formalizar RagOrchestrator como IContextAssembler. KB + Brand Health RAG en paralelo, single embedding. Degradación graceful: fallo en KB o brand health nunca bloquea respuesta

#52dT1.6
T2.7Health summary estructurado

BrandHealthContextService.getHealthSummary(userId, marketplace) → critique/delicate items, siempre inyectado en system prompt

#51dT2.6
T2.8Prompt caching Anthropic

SystemPromptBlock[] con cache_control: { type: “ephemeral” } en L1. Solo Anthropic client. Cache hit esperado: 90%

#21dT1.5
T2.9Tool result caching in-memory

Map<cacheKey, ToolResult> por sesión. Solo READ/ANALYSIS. Evita llamadas duplicadas al mismo tool en la misma conversación

#34hT2.2
T2.22KB Fase 2 — Procesamiento incremental

Content hash SHA-256 por documento. is_current flag en BigQuery. Solo re-embeder docs que cambiaron. Activar si >5 min pipeline o >5000 docs

#92dT1.21
T2.23KB Fase 3 — Batch embeddings

Enviar hasta 250 textos por llamada Vertex AI (vs 1-by-1). Retry con backoff en 429/5xx. Goroutine pool con semáforo (max 5). ~6000 calls → ~24 calls

#92dT1.21
Andrés#12 Marketplace Provider · #10 Data Sync · #14 DevOps — 10 tareas
T2.10ShopifyOAuth2Flow + ShopifyAdapter

OAuth2 Shopify (requiere URL tienda del vendedor). GraphQL Admin API. Rate limiting (throttling basado en costo Shopify). Queries productos, órdenes, inventario

#123dT1.10, T1.11
T2.11AmazonAdapter completo (si E1 aprobado)

SP-API SDK completo. Reports, Catalog Items, Orders. Rate limit 5 req/s con backoff exponencial. Si E1 no aprobado → diferir a S5

#123dT1.13
T2.12TokenRefreshCron

EventBridge rule cada 5min. Pre-refresh 30min antes de expiración. Mutex DynamoDB (evitar race condition). Umbral 3 fallos → alerta Slack

#121dT1.11, T1.12
T2.13Data Sync Fase 0.5 — Clean Architecture API

Refactor services/api/ sin cambio de comportamiento. IDataReader, ITokenProvider, VOs (UserId, Marketplace, DateRange) en dominio

#102dT0.8
T2.14DAGs existentes verificados

Verificar DAGs MeLi + Shopify @hourly sin errores. Verificar schemas Bronze. Fix si necesario

#101dT2.13
T2.15CDK base AWS

DynamoDB conversation-api (GSI corregido de T1.1), Lambda + API Gateway v2 HTTP, VPC + NAT. Marketplace Provider: DynamoDB marketplace-credentials, Secrets Manager, EventBridge

#142dT1.1
T2.16GitHub Actions CI multi-repo

lint + type-check + unit tests en cada PR para los 4 repos activos. Build cache via actions/cache. Status checks obligatorios

#141d
T2.16amarketplace-actions DynamoDB table en CDK

Tabla para MarketplaceAction entity. pk sellerId, sk actionId. GSI por marketplace+status

#144hT2.15
T2.16bAmazonAdsOAuth2Flow (dual OAuth)

Flujo OAuth2 separado para Amazon Ads API (distinto de LWA para SP-API). Credenciales separadas en Secrets Manager

#121dT1.13
T2.16cISKUResolver implementations

MeLi (ML prefix + item ID), Amazon (ASIN), Shopify (numeric product ID). Mapeo bidireccional SKU interno ↔ ID nativo marketplace

#121dT1.10
Sergio#1 Native Shell — 8 tareas
T2.17Chat UI + Markdown rendering

Input texto + markdown en sidebar. Burbujas usuario/asistente. Indicadores: “pensando”, “ejecutando tool X”, “listo”. Syntax highlighting en bloques código. +0.5d integración componentes T1.BB

#12.5dT1.19, T1.BB
T2.18CoachWebSocketService

WebSocket client en main process. Reconexión backoff exponencial (1s→2s→4s...max 30s). Heartbeat ping/pong 30s. Fallback: REST polling cada 2s

#11dT1.7
T2.19Inyección contexto URL→metadata

Detectar URL actual en WebContentsView → extraer marketplace, tipo página, product IDs via MarketplaceDetector → enviar como metadata con cada mensaje

#11dT1.18
T2.20Navegación vistas react-router

/chat (default), /profile, /billing, /enrollment, /onboarding. Tab bar inferior. Estado chat persistente entre cambios de vista

#11dT2.17
T2.21OnboardingWizard 5 pasos

(1) Bienvenida, (2) Conectar marketplace (OAuth inline), (3) Setup perfil, (4) Primera query guiada, (5) Éxito + próximos pasos. Solo primer launch (flag localStorage). Skip desde paso 3. +0.5d componentes T1.BB (OnboardingStep)

#12.5dT2.17, T1.12, T1.BB
T2.40Gate 1 signed build: .dmg notarized + .exe signedBuild firmado Gate 1: .dmg notarizado + .exe firmado

Apple codesign + notarytool + stapling for .dmg. Windows EV/OV signed .exe. Distribute to team via GitHub release artifacts. Verify Gatekeeper + SmartScreen pass. Gate 1 build milestoneApple codesign + notarytool + stapling para .dmg. Windows .exe firmado EV/OV. Distribuir al equipo vía GitHub release artifacts. Verificar Gatekeeper + SmartScreen. Hito build Gate 1

#11dT0.9, T0.10
T2.MK1Mockup ChatViewMockup ChatView

Assemble complete chat view in React: MessageBubbles + ContextBar (top) + ChatInputBar (bottom) + AgentStatusBar. Validate T1.BB components integrationEnsamblar vista completa de chat en React: MessageBubbles + ContextBar (arriba) + ChatInputBar (abajo) + AgentStatusBar. Validar integración componentes T1.BB

#11dT1.BB
T2.MK2Mockup OnboardingWizardMockup OnboardingWizard

Assemble 5 navigable steps with OnboardingStep component from T1.BB. Validate step transitions and progress indicatorsEnsamblar 5 pasos navegables con componente OnboardingStep de T1.BB. Validar transiciones de pasos e indicadores de progreso

#10.5dT1.BB
Pablo#16 Eval Suite · #17 Beautonomous — 4 tareas
T2.24Eval Fase 1 — LLM Judge + EvalRunner

AnthropicLLMJudge (Haiku standard, Sonnet critical). YamlDatasetLoader. EvalRunner orquesta pipeline: dataset → coach → judge → report. CLI eval.ts + check-threshold.ts. 20 golden cases mínimo

#163dT1.24
T2.25Testing E2E via Playground

Probar flujos completos con datos reales Sellerfy. Documentar QA findings → issues Linear via Beautonomous

#162dT1.6, T2.1
T2.26Bootstrap ~150 tareas en Linear

Crear masivamente tareas vía Beautonomous. 6 ciclos, labels L/M/S, dependencias ruta crítica. Aprobación 4 ingenieros antes de S1

#174hT0.7
T2.26aQuality gate 5-step Beautonomous

Configurar pipeline: structure → lint → tests → architecture review → convention check. Se ejecuta antes de aprobar PRs via OpenClaw

#171dT0.5
UX/UI + Pablo#18 Design System — 1 tarea
T2.BBRemaining Molecules + Data & Flow OrganismsMolecules restantes + Organismos de datos y flujos

Molecules: Select, Dropdown, Toggle labeled, Tooltip rich, ProgressBar labeled, KbdShortcut combo. Publish [LIB] Core Components complete. Organisms: ConfirmDialog REVERSIBLE/IRREVERSIBLE, ToolAccordion, ProactiveCard, MarketplaceKPI, CreditEconomy, EnrollmentCard. Owner: UX/UI executes + Pablo approves end S4Molecules: Select, Dropdown, Toggle labeled, Tooltip rich, ProgressBar labeled, KbdShortcut combo. Publicar [LIB] Core Components completo. Organismos: ConfirmDialog REVERSIBLE/IRREVERSIBLE, ToolAccordion, ProactiveCard, MarketplaceKPI, CreditEconomy, EnrollmentCard. Owner: UX/UI ejecuta + Pablo aprueba fin S4

#186dT1.BB

Sprints 5-6 — WRITE Tools + Billing + EnrichmentSprints 5-6 — Tools WRITE + Billing + Enrichment

Weeks 5-6 • 42 tasksSemanas 5-6 • 42 tareas
Gate:WRITE tool executes in marketplace. Billing charges credits. Enrichment returns competitor data.WRITE tool ejecuta en marketplace. Billing cobra créditos. Enrichment retorna datos competidores.
IDTaskTareaProjTimeTiempoDependsDepende
Mateo#3 Tool Registry · #6 Proactive · #7 Guardrails · #11 Enrichment · #2 Orchestrator · #9 Cerebro KB · #18 Design System — 15 tareas
T3.110 READ handlers reales

Conectar handlers a Fast Data Layer (11 endpoints FastAPI) o directo a Marketplace Provider si FDL no disponible. Cada handler: validación Zod → llamada HTTP → mapeo respuesta → ToolResult

#33dT2.5, T2.13
T3.2ConfirmationFlow

WRITE detectada → pausar ejecución → mostrar diff (before/after) al usuario → esperar Aceptar/Rechazar → ejecutar o cancelar. Timeout 35min. OrchestrationSession en DynamoDB (TTL 35min)

#22dT1.6, T2.3
T3.34 WRITE tool handlers

update_product_content (reversible), update_price (irreversible), pause_product (reversible), activate_product (reversible). Snapshot pre-write → confirmación → execute via IMarketplaceAdapter → verify → log

#33dT3.2
T3.4ProactiveSuggestionService

afterTool hook. LLM evalúa resultado → { hasSuggestion, message, suggestionType, priority, productId }. Max 2/turno. Dedup 7d via UserProfile.recentSuggestions. Sin reglas hardcodeadas

#62dT2.4
T3.5IGuardService + InputGuard

Detección prompt injection (pattern matching) + filtrado fuera de scope. Degradación graciosa: si guard falla → deja pasar, log warning

#71dT1.6
T3.5aHttpCreditGate en conversation-api

Cliente HTTP → POST /internal/gate de #13 Billing antes de ejecutar cada tool. Credit matrix: READ=1, ANALYSIS=2, WRITE=3. Fail-open si billing no responde

#21dT3.24
T3.6Enrichment scaffold + interfaces

IEnrichmentService, IMarketIntelligenceAdapter, IContentAnalysisAdapter, IEnrichmentCache en dominio. Modelos: MarketProduct, ImageAnalysisResult, EnrichmentResult. EnrichmentContainer DI

#111dT0.8
T3.7MeliMarketIntelligenceAdapter

MeLi Search API + Items API (gratis, sin credenciales). search_market_products, get_competitor_product, get_market_pricing. PriceDistributionCalculator (lógica dominio pura)

#112dT3.6
T3.8VisionLLMContentAdapter

Claude Vision para analyze_product_image + analyze_product_video. enhance_product_image = NotImplementedError en MVP

#111dT3.6
T3.9RedisEnrichmentCache + EnrichmentService

Cache con TTL por tool (15min-24h). Router: marketplace → adapter correcto. Fallo provider → EnrichmentResult con error, nunca excepción

#111dT3.7, T3.8
T3.10Enrichment CDK Stack

Lambda + API Gateway + ElastiCache Redis + VPC

#111dT3.9
T3.118 ANALYSIS tool handlers

Conectar a IEnrichmentService. 5 operativos (search_market, competitor, pricing, image, video) + get_keyword_data + get_product_fee_estimate + enhance_image (NotImplemented)

#32dT3.9
T3.12HallucinationChecker

Verificar claims numéricos (fees, métricas) contra tool results post-generación. Log pero no bloquear (Phase 1)

#21dT3.1
T3.25KB Indexación BigQuery

Indexar 15-20 docs en BigQuery vía pipeline Go. Verificar top-5 semantic search para 5 queries de prueba

#91dT1.22, T1.23
T3.32Token pipeline + Style DictionaryToken pipeline + Style Dictionary

Extract design-tokens.json via Figma MCP. Configure Style Dictionary → CSS :root + tailwind.config.ts in core-product-desktop-client. Validate: naming, modes, zero hardcodedExtraer design-tokens.json via Figma MCP. Configurar Style Dictionary → CSS :root + tailwind.config.ts en core-product-desktop-client. Validar: naming, modos, cero hardcoded

#182dT0.BB
Andrés#10 Data Sync · #12 Marketplace Provider · #14 DevOps — 6 tareas
T3.13Fast Data Layer — 11 endpoints

FastAPI 1:1 con Tool Registry. GET /data/{user_id}/fast/{tool}. Lee GCS Parquet directo via pyarrow (<500ms). Sin Redis

#103dT2.13
T3.14GCS snapshots para ConfirmationFlow

Router /data/{user_id}/snapshot/{tool}/{ts}. Pre-write state almacenado en GCS para rollback. snapshot_cleanup_dag en Airflow

#101dT3.13
T3.15DAG Amazon

IExtractor, ILoader para Amazon. AmazonAuthManager + AmazonExtractor + AmazonLoader. Verificar schemas Bronze MeLi + Shopify

#103dT2.14, T2.11
T3.16IRateLimiter por marketplace

3 implementaciones: MeLi token bucket 1500/min, Amazon burst/restore, Shopify leaky bucket cost-points. Contador Redis. Retorna 429 con retry-after

#121dT1.12
T3.17Onboarding trigger

Primer sync post-onboarding. Cuando usuario conecta marketplace → trigger DAG sincronización inicial

#121dT1.12, T2.14
T3.18CI/CD multi-repo completado

11 repos con GitHub Actions. Deploy automático staging en merge a main. Secrets en GitHub Org Secrets

#142dT2.16
Sergio#1 Native Shell · #13 Billing — 12 tareas
T3.19BillingView

Current plan, remaining credits, usage stats. Buttons → Stripe Checkout in system browser (not in-app). Low credit alerts. +0.5d integration of T2.BB components (CreditEconomy, MarketplaceKPI, CreditDisplay)Plan actual, créditos restantes, stats uso. Botones → Stripe Checkout en navegador del sistema (no in-app). Alertas créditos bajos. +0.5d integración componentes T2.BB (CreditEconomy, MarketplaceKPI, CreditDisplay)

#12.5dT2.20, T2.BB
T3.20Diálogos confirmación WRITE

Diff-style display (red/green). “I will change title from X to Y” → Accept/Reject. Timeout 35min with 5min reminder. Integrates with ConfirmationFlow T3.2. Uses ConfirmDialog REVERSIBLE + IRREVERSIBLE from T2.BBDisplay estilo diff (rojo/verde). “Voy a cambiar título de X a Y” → Aceptar/Rechazar. Timeout 35min con reminder 5min. Integra con ConfirmationFlow T3.2. Usa ConfirmDialog REVERSIBLE + IRREVERSIBLE de T2.BB

#11dT2.17, T3.2, T2.BB
T3.21Cards sugerencias + progreso tools

Clickable cards (“Review competitor prices”). Click opens pre-contextualized conversation. Tool progress: spinner with tool name. +0.5d ProactiveCard integration from T2.BBCards clicables (“Revisar precios competencia”). Click abre conversación pre-contextualizada. Progreso tool: spinner con nombre tool. +0.5d integración ProactiveCard de T2.BB

#11.5dT2.17, T2.18, T2.BB
T3.22ProfileView

Connected marketplaces, usage stats, preferences. Hook useProfile. Settings: language, notifications, default marketplace. Uses EnrollmentCard + Toggle labeled from T2.BBMarketplaces conectados, stats uso, preferencias. Hook useProfile. Settings: idioma, notificaciones, marketplace default. Usa EnrollmentCard + Toggle labeled de T2.BB

#11dT2.20, T2.BB
T3.23Stripe Checkout + Customer Portal

Checkout Pro ($49/mes). Customer Portal autoservicio (cancelar, actualizar pago). Webhook: checkout.session.completed → otorgar 500 créditos Pro

#133dT3.19
T3.24ICreditsGate + backend créditos

POST /internal/gate. READ=1cr, ANALYSIS=2cr, WRITE=3cr. DynamoDB conditional write (previene race). Free 50cr/mes, Pro 500cr/mes. Credit Packs ($5/100, $20/500, $35/1000). Fail-open si billing no responde

#132dT3.23
T3.24aBilling schema migration

ALTER TABLE clients (agregar campos Stripe). Nuevas tablas: credit_packs, subscription_events, credit_transactions. Script migración idempotente

#131dT3.23
T3.24bSubscriptionLifecycleService

activate (post-checkout), cancel (grace period 7d), upgrade, downgrade. Evento → subscription_events table. Webhook invoice.payment_failed → notificar

#131dT3.23
T3.24cMonthly credit reset cron

EventBridge + Lambda cada 1ro del mes. Reset plan credits (no pack credits). Pack credits expiran 12 meses. Log en credit_transactions

#134hT3.24
T3.MK1Mockup BillingViewMockup BillingView

Current plan + CreditEconomy + ProgressBar labeled + CreditDisplay + Stripe buttons. Validate T2.BB component integrationPlan actual + CreditEconomy + ProgressBar labeled + CreditDisplay + botones Stripe. Validar integración componentes T2.BB

#10.5dT2.BB
T3.MK2Mockup ProfileViewMockup ProfileView

EnrollmentCard list + Toggle labeled + InputFields assembled. Validate T2.BB component integrationLista EnrollmentCards + Toggles labeled + InputFields ensamblados. Validar integración componentes T2.BB

#10.5dT2.BB
T3.MK3Mockup ConfirmDialog in chat contextMockup ConfirmDialog en contexto de chat

ChatView + ConfirmDialog overlay (slide up + backdrop) × REVERSIBLE and IRREVERSIBLE variants. End-to-end confirmation UX validationChatView + ConfirmDialog superpuesto (slide up + backdrop) × variantes REVERSIBLE e IRREVERSIBLE. Validación UX de confirmación end-to-end

#10.5dT2.BB, T3.20
Pablo#16 Eval Suite — 8 tareas
T3.26Eval Fase 2 — CI integration

eval-on-pr.yml en GitHub Actions. Coach staging → LLM Judge → EvalReport. Si !passed → PR bloqueado. Comentario automático en PR. Update baseline en merge a main. Target: <10 min para 20-30 cases

#162dT2.24
T3.27Golden dataset 50 casos

Expandir: 15 producto, 10 pricing, 8 WRITE, 7 proactivo, 10 edge cases (injection, off-scope, datos vacíos, intención ambigua)

#163dT2.24
T3.28QA flujos conversación (3 marketplaces)

Probar todos los flujos con datos Sellerfy. Documentar issues → Linear vía Beautonomous

#162dT3.1, T3.3
T3.40Extend EvalConfig + CLIExtender EvalConfig + CLI

Add desktop_build and figma_quality as pipelineType. Create models DesktopBuildReport and FigmaQualityReport. Extend CLI for --pipeline=desktop_build and --pipeline=figma_qualityAgregar desktop_build y figma_quality como pipelineType. Crear modelos DesktopBuildReport y FigmaQualityReport. Extender CLI para --pipeline=desktop_build y --pipeline=figma_quality

#161dT2.24
T3.41FigmaRESTClient

IFigmaAPIClient: getFile, getFileVariables, getFileComponents, getFileStyles against Figma REST API. Auth via FIGMA_ACCESS_TOKENIFigmaAPIClient: getFile, getFileVariables, getFileComponents, getFileStyles contra API REST de Figma. Auth via FIGMA_ACCESS_TOKEN

#161.5dT3.40
T3.42FigmaQualityRunner + variable checksFigmaQualityRunner + checks de variables

Runner iterates files and executes configured checks. 4 variable checks: variable_architecture (3 collections), code_syntax (Code Syntax Web), semantic_aliasing (Semantic aliases Primitives), light_dark_modes (2 modes in Semantic)Runner itera archivos y ejecuta checks configurados. 4 checks de variables: variable_architecture (3 colecciones), code_syntax (Code Syntax Web), semantic_aliasing (Semantic aliasea Primitives), light_dark_modes (2 modos en Semantic)

#162dT3.41
T3.43Component checksChecks de componentes

5 checks: auto_layout (all components use Auto Layout), naming_convention (slash naming, no generic names), states_coverage (min states per type), color_hardcoding (no direct hex), spacing_hardcoding (no direct numeric values)5 checks: auto_layout (todo componente usa Auto Layout), naming_convention (slash naming, sin nombres genéricos), states_coverage (states mínimos por tipo), color_hardcoding (sin hex directo), spacing_hardcoding (sin valores numéricos directos)

#162dT3.42
T3.44Quality checks + reportChecks de calidad + reporte

3 checks: wcag_contrast (4.5:1 text, 3:1 UI), descriptions (published components have description), mcp_compatibility (semantic layer names). Generate report: compliance per file, violations by severity, correction suggestion per component3 checks: wcag_contrast (4.5:1 texto, 3:1 UI), descriptions (componentes publicados tienen description), mcp_compatibility (nombres semánticos en layers). Generar reporte: compliance por archivo, violaciones por severidad, sugerencia de corrección por componente

#161dT3.42
UX/UI + Pablo#18 Design System — 1 tarea
T3.BBAdvanced Organisms + close Pattern ComponentsOrganismos avanzados + cierre Pattern Components

ReActStream (3 collapsible blocks: Thought/Action/Observation), DataTable (sortable, skeleton loading), AuditLog (dot-line + JSON accordion), RollbackPanel (TTLCountdown + revert button), FraudAlert, ErrorRecovery A (amber)/B (red)/C (blue). Publish [LIB] Pattern Components complete. Owner: UX/UI executes + Pablo approves end S6ReActStream (3 bloques colapsables: Thought/Action/Observation), DataTable (sortable, skeleton loading), AuditLog (dot-line + acordeón JSON), RollbackPanel (TTLCountdown + botón revertir), FraudAlert, ErrorRecovery A (amber)/B (red)/C (blue). Publicar [LIB] Pattern Components completo. Owner: UX/UI ejecuta + Pablo aprueba fin S6

#185dT2.BB

Sprints 7-8 — Hardening + StagingSprints 7-8 — Hardening + Staging

Weeks 7-8 • 37 tasksSemanas 7-8 • 37 tareas
Gate:Staging full stack. Load test 50 users. WebSocket streaming. Proactive suggestions. Eval ≥0.70.Staging full stack. Load test 50 usuarios. WebSocket streaming. Proactive suggestions. Eval ≥0.70.
IDTaskTareaProjTimeTiempoDependsDepende
Mateo#2 Orchestrator · #4 Personality · #7 Guardrails · #3 Tool Registry · #9 Cerebro KB — 8 tareas
T4.1WebSocket streaming

Reemplazar REST. 8 eventos server→client: thinking, tool_start, tool_result, text_delta, suggestion, confirmation_required, error, done. 4 client→server. Restaurar sesión en reconexión

#22dT1.7, T3.3
T4.2SystemPromptComposer L3

Bloque ejecución cuando writeCapable=true. Guardrails de escritura inyectados condicionalmente. Hard cap 1200 tokens total

#41dT1.5, T3.3
T4.3OutputGuard

Validación post-LLM: prevención fuga datos (verificar respuesta no contiene datos otro usuario), filtrado contenido peligroso. Alerta crítica si fuga detectada

#71dT3.5
T4.4WRITE tools restantes (si caben)

Hasta 13 tools WRITE adicionales: update_product_images, update_product_video, update_stock, close_product, publish_product, answer_question, hide_question, send_buyer_message, request_review. Circuit breaker: lo que no quepa se corta a S11-12

#33dT3.3
T4.5Optimización performance

Target p95 <3s. Compactación ventana contexto, cache hit prompt, paralelización tools donde sea seguro. Perfilar y arreglar cuellos de botella

#22dT4.1
T4.5aFeedbackCapture en HookLifecycle

after_tool hook en conversation-api escribe FeedbackEntry a DynamoDB de #15 vía HTTP POST /feedback/capture. Solo para WRITE tools exitosas. Fire-and-forget

#21dT2.4, T4.13
T4.5bActionLog entity + DynamoActionLogRepository

pk User#{userId}, sk Action#{ULID}. GSI1 Conv#{convId}. Registra cada WRITE ejecutada. Integración en HookLifecycle after_tool

#21dT2.4, T3.3
T4.16KB batch + v2

Batch embeddings Vertex AI (250/llamada). Si pipeline >5min, activar procesamiento incremental. Target: >80% hit rate retrieval en 20 queries eval

#92dT2.22, T2.23
Andrés#14 DevOps · #10 Data Sync — 5 tareas
T4.6Staging deploy full stack AWS

CDK deploy: Lambda, API Gateway v2, DynamoDB, ElastiCache Redis, RDS PostgreSQL, Secrets Manager. Terraform GCP: Cloud Run, BigQuery, GCS, Airflow. Health-check verde. URL: api-staging.shopilot.ai

#143dT2.15, T3.18
T4.7Load testing 50 usuarios

Artillery/k6. Target: p95 <2s endpoints API (excluyendo latencia LLM). Identificar cuellos de botella: Redis, DynamoDB, API Gateway

#142dT4.6
T4.8Dashboard CloudWatch + alertas

Latencia API, tasa error, costo LLM/conversación, tool executions, créditos. Alertas PagerDuty: p95 >2s, error >1%. Alertas Slack: costo LLM/día >$50

#142dT4.6
T4.9Data Sync Silver + Gold (si cabe)

INormalizer, SilverNormalizer por marketplace. transform_to_silver_dag. IAggregator, DailySummaryAggregator. compute_gold_dag. Brand Health spike + IBrandHealthCalculator. Circuit breaker si no cabe

#103dT3.13
T4.9aAPI Gateway v2 WebSocket en CDK

Routes $connect/$disconnect/$default, DynamoDB connection-ids table, Lambda authorizer, IAM policies. Prerequisito de streaming en producción

#141dT4.6
Sergio#1 Native Shell · #15 Feedback Loop · #13 Billing — 13 tareas
T4.10WebSocket client progresivo

Sidebar connects to WebSocket conversation-api. Handles 8 server→client events: text_delta (progressive render), tool_start (spinner), tool_result, suggestion (card), confirmation_required, error, done. Reconnection backoff. +0.5d integration of T3.BB components (ReActStream + RollbackPanel)Sidebar conecta a WebSocket conversation-api. Maneja 8 eventos server→client: text_delta (render progresivo), tool_start (spinner), tool_result, suggestion (card), confirmation_required, error, done. Backoff reconexión. +0.5d integración componentes T3.BB (ReActStream + RollbackPanel)

#12.5dT2.18, T4.1, T3.BB
T4.11EnrollmentView standalone

Componente dedicado para flujos OAuth redirect de cada marketplace. BrowserWindow standalone (no popup). Reutiliza OAuth tokens de T2.21 OnboardingWizard

#11dT2.21
T4.12Sentry crash reporting

main + renderer. Source maps upload en build. Agrupación errores. Diálogo feedback en crash

#14hT1.16
T4.13Feedback Loop scaffold

package.json, tsconfig, interfaces dominio (IFeedbackRepository, IFeedbackGate, IDataSyncClient). Modelos: FeedbackEntry, ExplicitFeedbackEntry, ImplicitFeedbackEntry

#151dT0.8
T4.14calculateImpactScore + DynamoFeedbackRepository

Lógica pura (sales×0.4 + conversion×0.3 + visits×0.2 + position×-0.1). Repo DynamoDB: save, findPendingEntries (GSI1 status=pending), update, findByUser

#152dT4.13
T4.15FeedbackMeasurerService + Lambdas

processPendingEntries (entries >7 días). DataSyncClient HTTP. Retry 3x, unmeasurable si falla. EventBridge rate(6h). FeedbackAPIHandler: GET /feedback/:userId/summary + /history. CDK stack

#152dT4.14
T4.15aFeedbackGate anti-fatigue

should-prompt: max 1 prompt explicit feedback/día, skip si <3 interacciones en sesión, cooldown 24h post-feedback. Endpoint GET /feedback/:userId/should-prompt

#154hT4.13
T4.15bExplicit feedback endpoint

POST /feedback/:userId/explicit. Payload: rating (1-5), comment?, conversationId, toolName?. Persiste ExplicitFeedbackEntry

#154hT4.14
T4.15cImplicit feedback endpoint

POST /feedback/:userId/implicit. Payload: action (accepted/rejected/edited), conversationId, toolName, originalValue?, editedValue?. Persiste ImplicitFeedbackEntry

#154hT4.14
T4.15dGrace period 7d billing

Mantener acceso Pro 7 días post-cancelación. Webhook customer.subscription.deleted → marcar grace_period_end. Cron verifica expirados

#134hT3.24b
T4.MK1Mockup EnrollmentView standaloneMockup EnrollmentView standalone

Complete marketplace list in dedicated BrowserWindow × all states (Connected/Syncing/Error/Disconnected). Uses EnrollmentCard + ErrorRecovery for OAuth error statesLista completa de marketplaces en BrowserWindow dedicado × todos los estados (Connected/Syncing/Error/Disconnected). Usa EnrollmentCard + ErrorRecovery para estados error OAuth

#10.5dT3.BB
T4.MK2Mockup complete WRITE flowMockup flujo WRITE completo

ChatView with end-to-end mock flow: AgentStatusBar in ToolUse → ToolAccordion expanded → ReActStream (3 phases) → ConfirmDialog → RollbackPanel post-execution. Full WRITE UX validationChatView con flujo end-to-end mock: AgentStatusBar en ToolUse → ToolAccordion expandido → ReActStream (3 fases) → ConfirmDialog → RollbackPanel post-ejecución. Validación UX completa de WRITE

#11dT3.BB
T4.24Gate 2 signed build: full .dmg + .exe, all S8 featuresBuild firmado Gate 2: .dmg + .exe completo, todas las features S8

Full .dmg notarized + .exe signed with ALL S7-8 features integrated. Team smoke test on macOS + Windows. Gate 2 build milestone — candidate for beta distributionFull .dmg notarizado + .exe firmado con TODAS las features S7-8 integradas. Smoke test del equipo en macOS + Windows. Hito build Gate 2 — candidato para distribución beta

#10.5dT2.40
Pablo#16 Eval Suite · #17 Beautonomous — 10 tareas
T4.17Eval automatizado en CI

GitHub Action ejecuta 50 golden cases en cada push a main. Falla CI si score <0.70 o caso crítico falla. Resultados → #engineering Slack vía Beautonomous

#162dT3.26, T3.27
T4.18Testing proactivas datos reales

Probar ProactiveSuggestionService con datos Sellerfy. Verificar: triggers correctos, calidad mensaje, dedup, max 2/turno. Iterar prompt

#162dT3.4
T4.19Selección beta users + prep onboarding

10-15 vendedores Sellerfy (mix pequeño/mediano/grande). Video walkthrough 2 min, doc setup, formulario feedback. Calls 1-on-1 30min

#172dT2.21
T4.19aEval contract testing pipeline

Consumer-driven contracts entre repos: Tool Registry → Data Sync, Tool Registry → Marketplace Provider, Tool Registry → Enrichment cumplen contratos

#162dT3.26
T4.19bKB quality eval pipeline

Métricas retrieval automatizadas: precision@5, recall, hit rate. 20 queries de eval con expected chunks. Falla CI si hit rate <80%

#161dT4.16
T4.25Code signing secretsSecrets de code signing

Configure in GitHub: macOS certificates (Developer ID + Apple notarization) and Windows (Authenticode). Verify electron-builder recognizes themConfigurar en GitHub: certificados macOS (Developer ID + notarización Apple) y Windows (Authenticode). Verificar que electron-builder los reconoce

#161d
T4.26DesktopBuildRunner + core checksDesktopBuildRunner + checks core

6 checks: compilation (build completes + artifact exists), code signing (codesign/signtool verify), notarization (spctl, macOS only), app startup (headless <5s), bundle size (<250MB delta vs baseline), native modules (require without error)6 checks: compilación (build completa + artefacto existe), code signing (codesign/signtool verify), notarización (spctl, solo macOS), arranque app (headless <5s), bundle size (<250MB delta vs baseline), módulos nativos (require sin error)

#163dT3.40
T4.27Secondary checksChecks secundarios

Auto-updater (feed URL resolves), deep links (shopilot:// in Info.plist/Windows registry), window rendering (console.error), IPC channels (ping/pong). Warnings, not blockersAuto-updater (URL feed resuelve), deep links (shopilot:// en Info.plist/registro Windows), window rendering (console.error), canales IPC (ping/pong). Warnings, no blockers

#161dT4.26
T4.28GitHub Actions: desktop-build-eval.ymlGitHub Actions: desktop-build-eval.yml

3 jobs: build-macos (macos-14 runner), build-windows (windows-latest), report (aggregate + PR comment). Trigger: PRs touching core-product-desktop-client/3 jobs: build-macos (runner macos-14), build-windows (runner windows-latest), report (agregar + comentario PR). Trigger: PRs que tocan core-product-desktop-client/

#161.5dT4.26
T4.29GitHub Actions: figma-quality-eval.ymlGitHub Actions: figma-quality-eval.yml

Triggers: workflow_dispatch + weekly cron (Monday 8:00 UTC). Publishes report as GitHub issue or Slack #engineering messageTriggers: workflow_dispatch + cron semanal (lunes 8:00 UTC). Publica reporte como GitHub issue o mensaje Slack #engineering

#160.5dT3.42-T3.44
UX/UI + Pablo#18 Design System — 1 tarea
T4.BBFigma quality audit + correctionsAuditoría de calidad Figma + correcciones

Review all frames from S0–S6 against figma-best-practices.md checklist. All [LIB] Core Components and [LIB] Pattern Components marked “Ready for development”. Zero generic layer names. All colors/spacings/radii using variables (DevMode verified). All interactive states present. Changelog updated with version + date. Annotations for hover states, transitions, responsive notes. Owner: UX/UI executes corrections + Pablo validates complete checklistRevisar todos los frames de S0–S6 contra checklist figma-best-practices.md. Todos los [LIB] Core Components y [LIB] Pattern Components marcados “Ready for development”. Cero nombres de capas genéricos. Todos los colores/spacings/radios usando variables (verificado DevMode). Todos los states interactivos presentes. Changelog actualizado con versión y fecha. Annotations para hover states, transiciones, notas responsive. Owner: UX/UI ejecuta correcciones + Pablo valida checklist completo

#183dT3.BB

Sprints 9-10 — LaunchSprints 9-10 — Launch

Weeks 9-10 • 18 tasksSemanas 9-10 • 18 tareas
Gate:10+ beta users active. Signed .dmg. Billing live. Eval ≥0.70. p95 <3s. 0 P0 bugs.10+ beta users activos. .dmg firmado. Billing live. Eval ≥0.70. p95 <3s. 0 bugs P0.
IDTaskTareaProjTimeTiempoDependsDepende
Mateo#7 Guardrails · #2 Orchestrator · #3 Tool Registry · #4 Personality — 3 tareas
T5.1LLMGuardChecker

Clasificador LLM ligero (Haiku) para inputs que pasan pattern matching pero podrían ser injection/off-scope. Fallback: si checker falla → deja pasar

#71dT3.5, T4.3
T5.2Bug fixes backend

Todos los bugs P1/P2 de beta. Edge cases: resultados vacíos tools, LLM rehúsa usar tool, WRITE concurrentes, tokens expirados mid-conversación

#2, #34dT4.1
T5.3System Prompt v3 final

Ajuste basado en feedback beta. Arreglar problemas tono, patrones incorrectos selección tool, edge cases

#41dT5.10
Andrés#14 DevOps · #10 Data Sync — 4 tareas
T5.4Deploy producción

CDK deploy (Lambda + API Gateway v2 prod). Terraform apply (Cloud Run Data API prod). SSL + dominio api.shopilot.ai. Health checks

#143dT4.6
T5.5IaC producción completo

CDK: DynamoDB point-in-time recovery 35d, Secrets Manager, IAM roles, Lambda concurrency. Terraform: lifecycle policies GCS. Backup Redis. PostgreSQL backups diarios

#142dT5.4
T5.6Rollback testing

Revertir Lambda version (<1 min), Cloud Run revision rollback (<1 min). Documentar runbook

#141dT5.4
T5.6aData Sync Fase 4 — OpenMetadata + Embeddings

FQNs Amazon + Fast Data en OpenMetadata. embed_fast_dag (Bronze → Cerebro KB). embed_health_dag (Gold → KB). Linaje visible

#102dT4.9
Sergio#1 Native Shell · #13 Billing — 5 tareas
T5.7Code signing + .dmg + auto-updater

Cert Apple Developer. electron-builder DMG macOS. Notarización vía notarytool. Stapling ticket. Auto-updater → releases.shopilot.ai (S3). Probar en Mac limpio sin dev tools

#12dT4.12
T5.8Hardening seguridad Electron

CSP headers, sandbox habilitado, nodeIntegration=false, webSecurity=true. Telemetría básica anónima (opt-out)

#11dT5.7
T5.9Bug fixes UI/UX beta

All UI/UX bugs from beta feedback. RAM profiling (<500MB target). Polish: animations, transitions, loading states. +0.5d post-audit T4.BB alignment to fix visual inconsistencies accumulated S1-8Todos los bugs UI/UX feedback beta. RAM profiling (<500MB target). Polish: animaciones, transiciones, loading states. +0.5d alineación post-auditoría T4.BB para corregir inconsistencias visuales acumuladas S1-8

#13.5dT4.19, T4.BB
T5.10Billing Stripe live

Switch test → live. Verificar: checkout, webhooks, créditos, packs. SSL en billing endpoints

#131dT3.23, T5.4
T5.MK1Mockup Dashboard viewMockup Dashboard view

Only view not previously built — grid of MarketplaceKPIs + FraudAlert + AuditLog of recent actions + quick access to chat. Uses all audited T4.BB components (DataTable, MarketplaceKPI, AuditLog)Única vista no construida anteriormente — grid de MarketplaceKPIs + FraudAlert + AuditLog de últimas acciones + acceso rápido al chat. Usa todos los componentes auditados T4.BB (DataTable, MarketplaceKPI, AuditLog)

#11dT4.BB
Pablo#17 Beautonomous · #16 Eval Suite — 6 tareas
T5.11Onboarding beta 10-15 vendedores

Descargar .dmg → conectar marketplace → primera query → primera acción. Llamada 1-on-1 30min. Monitorear activación (1+ tool primera sesión)

#173dT5.7, T5.4
T5.12Feedback calls + iteración

15min con cada beta user. Documentar qué funcionó/no funcionó. Top 5 issues → Linear vía Beautonomous

#172dT5.11
T5.13Review seguridad OWASP top 10

Injection, auth roto, exposición datos, XSS, SSRF, etc. Documentar hallazgos + arreglar P1s

#161dT5.4, T5.7
T5.14System Prompt v2 Beautonomous

Iteración basada en 10 semanas uso real. Actualizar gobernanza. Indexar docs técnicos en OpenClaw KB

#171d
T5.15Go/No-Go

Sync final 60min, 4 ingenieros. Checklist: tools respondiendo, Stripe live, 10+ beta, .dmg firmado, OWASP P1s, p95 <3s, costo guard, eval ≥0.70. Pablo firma Go

#174hT5.1–T5.14
T5.15aE2E eval pipeline

Flujo completo end-to-end (query → tool selection → execution → response). 10+ escenarios. Diferente de LLM Judge (evalúa respuesta) — esto evalúa el flujo completo

#162dT4.17
#18 Design System: No BB task in S9-10 — Figma pipeline closed. UX/UI available for point queries from Sergio on visual edge cases only.#18 Design System: Sin tarea BB en S9-10 — pipeline Figma cerrado. UX/UI disponible solo para consultas puntuales de Sergio sobre edge cases visuales.

S11-12 — Buffer (Weeks 11-12)S11-12 — Buffer (Semanas 11-12)

Circuit breaker absorbs scope + beta bugs + hardeningCircuit breaker absorbe scope diferido + bugs beta + hardening

Shape Up circuit breaker: tasks not completed at S10 deadline are cut here rather than delaying launch. S11 = hardening + P0/P1 bugs. S12 = deferred scope that cleared circuit breaker.Circuit breaker Shape Up: tareas no completadas al deadline de S10 se cortan aquí en vez de retrasar el launch. S11 = hardening + bugs P0/P1. S12 = scope diferido que pasó el circuit breaker.

EngineerIngenieroS11 — HardeningS11 — HardeningS12 — Deferred scopeS12 — Scope diferido
MateoBug fixes P1/P2 inteligencia. Optimización p95. WRITE tools cortadas en S7-8.Advertising tools Fase 5: 4 WRITE (create/update/pause/activate_campaign). Enrichment Rainforest API adapter (Amazon market intelligence). ProactiveSuggestions v2. LLMGuardChecker Phase 2. KB v3: docs de preguntas reales de beta que KB v2 no cubría.
AndrésHardening producción: alertas, runbooks, rollback drills. Fix bugs adapters.DAG Silver→Gold (si cortado). Rate limiters datos reales. Monitoring expandido.
SergioBug fixes UI/UX beta. RAM profiling. .dmg hotfix si necesario.Auto-updater S3. Windows build (si alcanza). FeedbackThrottle anti-fatigue refinement. Feedback UI mejorada.
PabloIteración Eval conversaciones reales. Expansión golden dataset edge cases.Eval score target 0.80. Documentación técnica + postmortem.

Visual Gantt — 12 Weeks (10+2 buffer) × 4 Engineers + UX/UI Team × 19 Projects Gantt Visual — 12 Semanas (10+2 buffer) × 4 Ingenieros + Equipo UX/UI × 19 Proyectos

W0
Pre-Sprint
W1–W2
Walking Skeleton
W3–W4 ★G1
Core Engines
W5–W6
WRITE + Billing
W7–W8 ★G2
Hardening
W9–W10 ★G3
Launch
Mateo
CTO
Spec review
env setup
T1.1–T1.25 (12)
DynamoDB fix · UserProfile · Historial 200K
ILLMClient · AgentLoop MAX=10 · REST emitter
KB fix dupes · KB contextual · KB 15-20 docs · 10 READ specs
#2 Orchestrator · #4 Personality · #8 Obs · #9 KB
T2.1–T2.23 (15)
ToolRegistry · HookLifecycle · 10 READ stubs
17 WRITE stubs · IContextAssembler · Prompt cache
KB incremental · KB batch embeddings
#3 Tool Registry · #5 Context · #2 Orch · #9 KB
T3.1–T3.32 (15)
10 READ real · ConfirmationFlow · 4 WRITE handlers
ProactiveSuggest · InputGuard · Enrichment scaffold+CDK
KB indexing BigQuery · DS token pipeline+Style Dict
#3 Tools · #6 Proactive · #7 Guard · #11 Enrichment · #9 KB · #18 DS
T4.1–T4.16 (8)
WebSocket 8 events · SysPrompt L3 · OutputGuard
WRITE tools (circuit breaker) · perf p95 <3s
KB batch v2 >80% hit rate
#2 Orch · #4 Personality · #7 Guard · #3 Tools · #9 KB
T5.1–T5.3 (3)
LLMGuardChecker · Bug fixes P1/P2 backend
System Prompt v3 final
#7 Guard · #2, #3 · #4 Personality
Andrés
Data+BE
Spec review
env setup
T1.9–T1.33 (13)
Scaffold MP · IMarketplaceAdapter · AES256GCM
MeLiOAuth+Adapter · AmazonLWA stub · Ext deps E1-E5
WRITE API docs · User mgmt provider research · CI electron-builder
#12 Marketplace Provider · #14 DevOps
T2.10–T2.16c (10)
Shopify adapter · Amazon full · TokenRefreshCron
DataSync clean arch · CDK base AWS · GitHub Actions CI
#12 MP · #10 Data Sync · #14 DevOps
T3.13–T3.18 (6)
Fast Data Layer 11 endpoints GCS Parquet <500ms
GCS snapshots · DAG Amazon · IRateLimiter
#10 Data Sync · #12 MP · #14 DevOps
T4.6–T4.9a (5)
Staging full stack AWS+GCP · Load test 50 users
CloudWatch alerts · Data Sync Silver+Gold · API GW WS
#14 DevOps · #10 Data Sync
T5.4–T5.6a (4)
Deploy producción Lambda+Cloud Run · IaC PITR
Rollback testing <1min · OpenMetadata+Embeddings
#14 DevOps · #10 Data Sync
Sergio
Full-stack
Spec review
env setup
T1.16–T1.MK1 (7)
Sem 1 (sin Figma): Electron scaffold · MainWindow+WCV
MarketplaceDetector · Auth · canary build
Sem 2 (con T0.BB): Tabs+Sidebar 2.5d · Mockup shell 0.5d
#1 Native Shell (consumes #18 T0.BB)
T2.17–T2.MK2 (8)
Chat UI 2.5d · CoachWebSocket · URL injection
react-router · OnboardingWizard 2.5d · Gate 1 build
Mockup ChatView 1d · Mockup Onboarding 0.5d
#1 Native Shell (consumes #18 T1.BB)
T3.19–T3.MK3 (12)
BillingView 2.5d · WRITE dialogs · Suggestion cards 1.5d
Stripe Checkout · ICreditsGate · SubscriptionLifecycle
Mockups: BillingView · ProfileView · ConfirmDialog
#1 Shell · #13 Billing (consumes #18 T2.BB)
T4.10–T4.MK2 (12)
WebSocket client 2.5d · EnrollmentView · Sentry
Feedback Loop scaffold+service+endpoints · Gate 2 build
Mockup EnrollmentView 0.5d · Mockup WRITE flow 1d
#1 Shell · #15 Feedback · #13 Billing (consumes #18 T3.BB)
T5.7–T5.MK1 (6)
Code signing + .dmg + auto-updater S3 · Electron hardening
Bug fixes UI/UX beta 3.5d (+0.5d post T4.BB) · Billing Stripe live
Mockup Dashboard view 1d
#1 Shell · #13 Billing (consumes #18 T4.BB)
UX/UI
#18 DS
T0.11 Brand
Book request
T0.BB + T1.BB
W1: Foundations + Tokens + Iconography + Core partial (4d)
W2: Atoms + Molecules + Chat Organisms (6d)
#18 Design System • Pablo approves
T2.BB (6d)
Molecules remaining + Data/Flow Organisms
ConfirmDialog · ProactiveCard · MarketplaceKPI
#18 Design System • Pablo approves
T3.BB (5d)
Advanced Organisms + [LIB] Pattern Components
ReActStream · DataTable · ErrorRecovery
#18 Design System • Pablo approves
T4.BB (3d)
Figma Quality Audit + Corrections
All frames “Ready for development”
#18 Design System • Pablo approves
Pipeline closed
Point queries only
Pablo
CEO
T0.1–T0.11 (11)
OpenClaw Project · GitHub+Linear+Slack OAuth
System Prompt v1 · Role Mapping · Linear 17 projs
Apple Dev cert + Windows cert
#17 Beautonomous · #14 DevOps
T1.24–T1.27 (3)
Eval setup+golden 15 YAML
Brand registration · Apple/Windows Store auth
#16 Eval · #10 Data Sync · #1 Shell
#18 approves T0.BB + T1.BB
T2.24–T2.26a (4)
Eval LLM Judge+Runner · E2E Playground
Bootstrap 150 tasks · Quality gate 5-step
#16 Eval · #17 Beautonomous
#18 approves T2.BB
T3.26–T3.44 (8)
Eval CI PR-blocking · Golden dataset 50 · QA
EvalConfig+CLI · FigmaRESTClient · FigmaQualityRunner
ComponentChecks · Quality+Report
#16 Eval (incl. Figma Quality ext. 7.5d)
#18 approves T3.BB
T4.17–T4.29 (10)
Eval CI 50 cases ≥0.70 · Testing proactivas
Beta selection · Contract + KB quality eval
Secrets · DesktopBuildRunner · desktop-build-eval.yml
#16 Eval · #17 Beautonomous (incl. Desktop Build ext. 7d)
#18 approves T4.BB
T5.11–T5.15a (6)
Beta onboarding 10-15 sellers · Feedback calls
OWASP top 10 · SysPrompt v2 Beautonomous · Go/No-Go
#17 Beautonomous · #16 Eval
Gates
★ T0.8
W0
CORE live
S2 milestone
W2
ReAct loop · Electron loads MP · 3 OAuth · KB indexed
★ Gate 1
W4 — Pablo decides
ToolRegistry + 10 READ stubs · Shell chat · KB BigQuery · Eval 15 cases
S6 milestone
W6
WRITE tool executes · Billing charges · Enrichment returns data
★ Gate 2
W8 — Pablo decides
Staging full · Load 50 users · WebSocket · Proactive · Eval ≥0.70
★ Gate 3 / Go
W10 — Pablo signs
10+ beta · .dmg firmado · Billing live · Eval ≥0.70 · p95 <3s · 0 P0
Mateo (CTO) — #2,#3,#4,#5,#6,#7,#8,#9,#11
Andrés (Data+BE) — #10,#12,#14
Sergio (Full-stack) — #1,#13,#15 + Mockups
UX/UI (External) — #18 DS (Pablo approves)
Pablo (CEO) — #16,#17,#18 (approves)
Critical Path
★ Gate
Go/No-Go — Pablo decides
Critical Path: T0.8 (Pablo W0) → T1.1 (Mateo W1) → T1.6 (Mateo W2) → T2.1 (Mateo W3) → T3.2 (Mateo W5) → T3.3 (Mateo W5) → T4.1 (Mateo W7) → T4.5 (Mateo W7) → T5.2 (Mateo W9) → T5.15 (Pablo W10 — Go)
Parallel infra: T1.1 → T2.15 (Andrés W3) → T4.6 (Andrés W7) → T5.4 (Andrés W9)
▶ Linear CSV Export — All 183 Tasks (copy & import) ▶ Exportación CSV Linear — 183 Tareas (copiar e importar)

Import in Linear: Team Settings → Import → CSV. Columns: ID, Title, Assignee, Project, Cycle, Priority, EstimateHours, DependsOn. Source: 70-EXEC-BACKLOG-CORREGIDO.md v2.0Importar en Linear: Team Settings → Import → CSV. Columnas: ID, Title, Assignee, Project, Cycle, Priority, EstimateHours, DependsOn. Fuente: 70-EXEC-BACKLOG-CORREGIDO.md v2.0

ID,Title,Assignee,Project,Cycle,Priority,EstimateHours,DependsOn
T0.1,Create OpenClaw Project,Pablo,#17 Beautonomous,Pre-Sprint,Urgent,0.5,
T0.2,Connect GitHub OAuth,Mateo,#17 Beautonomous,Pre-Sprint,Urgent,0.5,T0.1
T0.3,Connect Linear OAuth,Pablo,#17 Beautonomous,Pre-Sprint,Urgent,0.5,T0.1
T0.4,Connect Slack OAuth,Mateo,#17 Beautonomous,Pre-Sprint,Urgent,0.5,T0.1
T0.5,Write System Prompt v1 Beautonomous,Pablo,#17 Beautonomous,Pre-Sprint,Urgent,4,T0.1
T0.6,Configure Role Mapping (Capitan/Mago/Artesano),Pablo,#17 Beautonomous,Pre-Sprint,High,1,T0.5
T0.7,Create Linear Structure 17 Projects 6 Cycles,Pablo,#17 Beautonomous,Pre-Sprint,High,2,T0.3
T0.8,Validation 4 Members x3 Queries Verify Permissions,All 4,#17 Beautonomous,Pre-Sprint,Urgent,1,"T0.2,T0.3,T0.4,T0.5,T0.6"
T0.9,Apple Developer Program enrollment + cert,Pablo,#14 DevOps,Pre-Sprint,High,1,-
T0.10,Windows code signing cert procurement,Pablo,#14 DevOps,Pre-Sprint,High,1,-
T0.11,Brand Book delivery from external design team — request following core-product-design-system repo guidelines,Pablo,#18 Design System,Cycle 0,High,0,
T0.BB,Figma Foundations Delivery: Brand book + [LIB] Foundations & Tokens + [LIB] Iconography + [LIB] Core Components partial,UX/UI,#18 Design System,Cycle 1,Urgent,32,T0.11
T1.1,DynamoDB Fix: IDs UUID->ULID Trace SK GSI fix CDK Stack,Mateo,#2 Orchestrator,Cycle 1,Urgent,24,T0.8
T1.2,UserProfile Entity IUserProfileRepository DynamoUserProfileRepository,Mateo,#2 Orchestrator,Cycle 1,High,8,T1.1
T1.3,Conversation History in Prompt findWindowForPrompt 200K budget,Mateo,#2 Orchestrator,Cycle 1,High,16,T1.1
T1.4,ILLMClient update: toolDefinitions thinkingBudget ContentBlock[] all clients,Mateo,#2 Orchestrator,Cycle 1,High,16,
T1.5,SystemPromptComposer L1+L2: identity base + session UserProfile cache_control,Mateo,#4 Personality,Cycle 1,High,16,T1.2
T1.6,AgentLoopOrchestrator ReAct Reason-Act-Observe MAX_ROUNDS=10 200K budget,Mateo,#2 Orchestrator,Cycle 1,Urgent,24,"T1.3,T1.4,T1.5"
T1.7,RestResponseEventEmitter REST mode no streaming,Mateo,#2 Orchestrator,Cycle 1,Medium,4,T1.6
T1.8,Verify Observability with ReAct ConversationTrace multi-step tool calls round count,Mateo,#8 Observability,Cycle 1,Medium,8,T1.6
T1.9,Scaffold Marketplace Provider Clean Architecture DDD Value Objects DI container,Andres,#12 Marketplace Provider,Cycle 1,Urgent,8,T0.8
T1.10,IMarketplaceAdapter Interface 23 methods 4 domains ISKUResolver,Andres,#12 Marketplace Provider,Cycle 1,Urgent,4,T1.9
T1.11,AES256GCMCipher + ITokenManager DynamoDB marketplace-credentials auto-refresh 15min,Andres,#12 Marketplace Provider,Cycle 1,High,16,T1.9
T1.12,MeLiOAuth2Flow + MeLiAdapter OAuth2 code flow REST API error mapping,Andres,#12 Marketplace Provider,Cycle 1,Urgent,24,"T1.10,T1.11"
T1.13,AmazonLWAFlow + AmazonAdapter scaffold SP-API SDK rate limiting stub only,Andres,#12 Marketplace Provider,Cycle 1,High,16,"T1.10,T1.11"
T1.14,Verify Terraform GCP: GCS Cloud Run Airflow BigQuery operational,Andres,#14 DevOps,Cycle 1,High,8,T0.8
T1.15,Request External Dependencies E1-E5: Amazon SP-API MeLi Shopify Apple,Andres,#14 DevOps,Cycle 1,Urgent,4,T0.8
T1.15a,SellerConnection Aggregate state machine 5 states DynamoDB persist,Andres,#12 Marketplace Provider,Cycle 1,High,8,T1.9
T1.15b,MarketplaceAction Entity + IMarketplaceActionRepository actionId sellerId latencyMs,Andres,#12 Marketplace Provider,Cycle 1,Medium,4,T1.9
T1.15c,IOAuth2Flow Interface domain port authorize exchangeCode refreshToken,Andres,#12 Marketplace Provider,Cycle 1,Medium,4,T1.9
T1.28,Collect missing WRITE API docs MeLi 3 AmazonAds 5 Amazon 2 Shopify 9 for Tool Registry mapping,Andres,#12 Marketplace Provider,Cycle 1,High,24,T0.8
T1.29,Collect user management provider docs auth external Auth0 Clerk Memberstack service methods,Andres,#12 Marketplace Provider,Cycle 1,High,16,T0.8
T1.16,Scaffold Electron + electron-builder Electron 28 main process preload contextBridge hot reload,Sergio,#1 Native Shell,Cycle 1,Urgent,8,T0.8
T1.17,MainWindow + WebContentsView 70% width navigation persistence (not BrowserView deprecated E26),Sergio,#1 Native Shell,Cycle 1,High,16,T1.16
T1.18,MarketplaceDetector URL patterns MeLi Amazon Shopify page type extraction remote config,Sergio,#1 Native Shell,Cycle 1,High,8,T1.17
T1.19,Tab System + Sidebar Container React 360px design-system tokens IPC main-renderer Toggle Cmd+B (+0.5d setup tokens T0.BB),Sergio,#1 Native Shell,Cycle 1,High,20,T1.17
T1.20,Auth Memberstack JWT electron-store OS key AuthService main process,Sergio,#1 Native Shell,Cycle 1,High,8,T1.16
T1.MK1,Mockup shell container: validate Figma tokens/components in real Electron+React context,Sergio,#1 Native Shell,Cycle 1,Medium,4,"T0.BB,T1.19"
T1.BB,Atoms + Molecules Base + Chat Organisms delivery,UX/UI,#18 Design System,Cycle 1,Urgent,48,T0.BB
T1.32,First .dmg + .exe canary build unsigned,Sergio,#1 Native Shell,Cycle 1,High,8,T1.16
T1.33,GitHub Actions CI electron-builder,Andres,#14 DevOps,Cycle 1,High,4,T1.32
T1.21,KB Phase 0 Fix Duplicates TRUNCATE before embed embedded_at CI Go 1.21->1.24,Mateo,#9 Cerebro KB,Cycle 1,High,16,
T1.22,KB Phase 1 Contextual Retrieval prefix chunking Markdown overlap 150 chars,Mateo,#9 Cerebro KB,Cycle 1,High,16,T1.21
T1.23,KB Content 15-20 Curated Docs MeLi Amazon Shopify pricing photos FAQ,Mateo,#9 Cerebro KB,Cycle 1,High,40,
T1.24,Eval Phase 0 Setup + Golden Dataset IEvalPipeline ILLMJudge 15-20 YAML cases,Pablo,#16 Eval Suite,Cycle 1,High,24,
T1.25,10 READ Tool Specs: name description inputSchema riskLevel creditCost for all 10 tools,Mateo,#9 Cerebro KB,Cycle 1,Urgent,16,
T1.26,Brand registration marketplaces Amazon Brand Registry AmazonAds MeLi Shopify weekly tracking,Pablo,#10 Data Sync,Cycle 1,High,24,T0.8
T1.27,Authorize app Apple Windows Store: Apple Developer Program $99yr code signing cert + Microsoft Partner Center,Pablo,#1 Native Shell,Cycle 1,High,16,T0.8
T2.1,ToolRegistry + ToolDefinition register registerRemote getDefinitions Zod schema categories,Mateo,#3 Tool Registry,Cycle 2,Urgent,16,"T1.6,T1.25"
T2.2,IToolExecutor + ToolExecutor execute toolName args context -> ToolResult,Mateo,#3 Tool Registry,Cycle 2,High,8,T2.1
T2.3,ToolPolicyFilter risk gate irreversible confirmation marketplace gate extensible,Mateo,#3 Tool Registry,Cycle 2,High,8,T2.1
T2.4,HookLifecycle before_tool -> execute -> after_tool after_tool runs even on failure,Mateo,#3 Tool Registry,Cycle 2,High,8,T2.2
T2.5,10 READ Tool Handlers stubs HTTP mock data handlers/read/ directory,Mateo,#3 Tool Registry,Cycle 2,Urgent,16,T2.1
T2.5a,ToolResult Domain Model toolName args isError latencyMs cached creditCost immutable,Mateo,#3 Tool Registry,Cycle 2,High,4,T2.1
T2.5b,update_user_profile SYSTEM Tool Handler updates UserProfile from conversation,Mateo,#3 Tool Registry,Cycle 2,Medium,4,"T2.1,T1.2"
T2.5c,contextSummary Auto-summarize conversation when token threshold exceeded,Mateo,#5 Context Aggregator,Cycle 2,Medium,8,T1.3
T2.5d,17 WRITE Tool Stubs ConfirmationRequired policy NotImplemented handlers visible to LLM,Mateo,#3 Tool Registry,Cycle 2,High,4,T2.1
T2.6,IContextAssembler KB + Brand Health RAG parallel single embedding graceful degradation,Mateo,#5 Context Aggregator,Cycle 2,High,16,T1.6
T2.7,Health Summary BrandHealthContextService.getHealthSummary always in system prompt,Mateo,#5 Context Aggregator,Cycle 2,High,8,T2.6
T2.8,Prompt Caching Anthropic SystemPromptBlock cache_control ephemeral 90% hit rate,Mateo,#2 Orchestrator,Cycle 2,Medium,8,T1.5
T2.9,Tool Result Caching In-Memory Map per session READ ANALYSIS only,Mateo,#3 Tool Registry,Cycle 2,Medium,4,T2.2
T2.10,ShopifyOAuth2Flow + ShopifyAdapter GraphQL Admin API rate limiting cost-based,Andres,#12 Marketplace Provider,Cycle 2,High,24,"T1.10,T1.11"
T2.11,AmazonAdapter Complete SP-API Reports Catalog Orders rate limit 5req/s (if E1 approved),Andres,#12 Marketplace Provider,Cycle 2,High,24,T1.13
T2.12,TokenRefreshCron EventBridge 5min pre-refresh 30min DynamoDB mutex 3 fails -> Slack alert,Andres,#12 Marketplace Provider,Cycle 2,High,8,"T1.11,T1.12"
T2.13,Data Sync Phase 0.5 Clean Architecture API IDataReader ITokenProvider VOs domain,Andres,#10 Data Sync,Cycle 2,High,16,T0.8
T2.14,Verify Existing DAGs MeLi + Shopify @hourly Bronze schemas fix if needed,Andres,#10 Data Sync,Cycle 2,High,8,T2.13
T2.15,CDK Base AWS: DynamoDB Lambda API Gateway v2 HTTP VPC NAT Secrets EventBridge,Andres,#14 DevOps,Cycle 2,Urgent,16,T1.1
T2.16,GitHub Actions CI Multi-Repo lint type-check unit tests build cache status checks,Andres,#14 DevOps,Cycle 2,High,8,
T2.16a,marketplace-actions DynamoDB Table CDK pk sellerId sk actionId GSI marketplace+status,Andres,#14 DevOps,Cycle 2,Medium,4,T2.15
T2.16b,AmazonAdsOAuth2Flow dual OAuth Amazon Ads API separate from LWA SP-API,Andres,#12 Marketplace Provider,Cycle 2,Medium,8,T1.13
T2.16c,ISKUResolver Implementations MeLi ASIN Shopify numeric bidirectional mapping,Andres,#12 Marketplace Provider,Cycle 2,Medium,8,T1.10
T2.17,Chat UI + Markdown Rendering bubbles indicators thinking/executing-tool/done syntax highlight (+0.5d integration T1.BB),Sergio,#1 Native Shell,Cycle 2,Urgent,20,T1.19
T2.18,CoachWebSocketService main process backoff reconnect 1s-30s heartbeat 30s REST fallback 2s,Sergio,#1 Native Shell,Cycle 2,Urgent,8,T1.7
T2.19,URL->Metadata Injection WebContentsView URL -> marketplace page-type product IDs as metadata,Sergio,#1 Native Shell,Cycle 2,High,8,T1.18
T2.20,react-router View Navigation /chat /profile /billing /enrollment /onboarding persistent chat,Sergio,#1 Native Shell,Cycle 2,Medium,8,T2.17
T2.21,OnboardingWizard 5 Steps welcome->OAuth->profile->guided-query->success localStorage skip from 3 (+0.5d T1.BB),Sergio,#1 Native Shell,Cycle 2,High,20,"T2.17,T1.12"
T2.MK1,Mockup ChatView: integrate chat organisms from T1.BB in real ChatView component,Sergio,#1 Native Shell,Cycle 2,Medium,8,"T1.BB,T2.17"
T2.MK2,Mockup OnboardingWizard: integrate onboarding components from T1.BB,Sergio,#1 Native Shell,Cycle 2,Medium,4,"T1.BB,T2.21"
T2.BB,Molecules remaining + Data/Flow Organisms delivery,UX/UI,#18 Design System,Cycle 2,Urgent,48,T1.BB
T2.40,Gate 1 signed build .dmg + .exe,Sergio,#1 Native Shell,Cycle 2,High,8,"T0.9,T0.10"
T2.22,KB Phase 2 Incremental Processing SHA-256 hash is_current flag re-embed changed only,Mateo,#9 Cerebro KB,Cycle 2,High,16,T1.21
T2.23,KB Phase 3 Batch Embeddings 250 texts/call Vertex AI goroutine pool semaphore max 5,Mateo,#9 Cerebro KB,Cycle 2,High,16,T1.21
T2.24,Eval Phase 1 LLM Judge + EvalRunner AnthropicLLMJudge YamlDatasetLoader CLI eval.ts 20 cases,Pablo,#16 Eval Suite,Cycle 2,High,24,T1.24
T2.25,E2E Testing via Playground real Sellerfy data document QA findings -> Linear Beautonomous,Pablo,#16 Eval Suite,Cycle 2,High,16,"T1.6,T2.1"
T2.26,Bootstrap ~150 Tasks in Linear via Beautonomous 6 cycles L/M/S labels critical path deps,Pablo,#17 Beautonomous,Cycle 2,Medium,4,T0.7
T2.26a,Quality Gate 5-Step Beautonomous structure->lint->tests->arch-review->convention before PRs,Pablo,#17 Beautonomous,Cycle 2,High,8,T0.5
T3.1,10 READ Handlers Real connect to Fast Data Layer 11 FastAPI endpoints Zod -> HTTP -> ToolResult,Mateo,#3 Tool Registry,Cycle 3,Urgent,24,"T2.5,T2.13"
T3.2,ConfirmationFlow WRITE pauses -> shows diff -> Aceptar/Rechazar -> 35min timeout DynamoDB TTL,Mateo,#2 Orchestrator,Cycle 3,Urgent,16,"T1.6,T2.3"
T3.3,4 WRITE Tool Handlers update_product_content update_price pause_product activate_product,Mateo,#3 Tool Registry,Cycle 3,Urgent,24,T3.2
T3.4,ProactiveSuggestionService afterTool LLM evaluates result max 2/turn dedup 7d no hardcoded rules,Mateo,#6 Proactive Suggestions,Cycle 3,High,16,T2.4
T3.5,IGuardService + InputGuard prompt injection pattern matching out-of-scope graceful degradation,Mateo,#7 Guardrails,Cycle 3,High,8,T1.6
T3.5a,HttpCreditGate in conversation-api HTTP POST /internal/gate READ=1 ANALYSIS=2 WRITE=3 fail-open,Mateo,#2 Orchestrator,Cycle 3,Urgent,8,T3.24
T3.6,Enrichment Scaffold + Interfaces IEnrichmentService IMarketIntelligenceAdapter IEnrichmentCache DI,Mateo,#11 Enrichment,Cycle 3,High,8,T0.8
T3.7,MeliMarketIntelligenceAdapter Search+Items API free search_market_products competitor pricing,Mateo,#11 Enrichment,Cycle 3,High,16,T3.6
T3.8,VisionLLMContentAdapter Claude Vision analyze_product_image analyze_product_video,Mateo,#11 Enrichment,Cycle 3,Medium,8,T3.6
T3.9,RedisEnrichmentCache + EnrichmentService TTL per tool router marketplace->adapter fail gracefully,Mateo,#11 Enrichment,Cycle 3,High,8,"T3.7,T3.8"
T3.10,Enrichment CDK Stack Lambda + API Gateway + ElastiCache Redis + VPC,Mateo,#11 Enrichment,Cycle 3,High,8,T3.9
T3.11,8 ANALYSIS Tool Handlers connect to IEnrichmentService 5 ops + keyword + fee + enhance(NotImpl),Mateo,#3 Tool Registry,Cycle 3,High,16,T3.9
T3.12,HallucinationChecker numeric claims fees metrics vs tool results post-gen log no block Phase 1,Mateo,#2 Orchestrator,Cycle 3,Medium,8,T3.1
T3.13,Fast Data Layer 11 FastAPI Endpoints GET /data/{user_id}/fast/{tool} GCS Parquet <500ms no Redis,Andres,#10 Data Sync,Cycle 3,Urgent,24,T2.13
T3.14,GCS Snapshots for ConfirmationFlow snapshot/{tool}/{ts} pre-write state rollback cleanup DAG,Andres,#10 Data Sync,Cycle 3,High,8,T3.13
T3.15,DAG Amazon IExtractor ILoader AmazonAuthManager AmazonExtractor AmazonLoader Bronze schemas,Andres,#10 Data Sync,Cycle 3,High,24,"T2.14,T2.11"
T3.16,IRateLimiter per Marketplace: MeLi token bucket 1500/min Amazon burst Shopify leaky bucket Redis,Andres,#12 Marketplace Provider,Cycle 3,High,8,T1.12
T3.17,Onboarding Trigger first sync post-onboarding marketplace connect -> trigger DAG initial sync,Andres,#12 Marketplace Provider,Cycle 3,Medium,8,"T1.12,T2.14"
T3.18,CI/CD Multi-Repo Complete all 11 repos GitHub Actions staging auto-deploy on main merge,Andres,#14 DevOps,Cycle 3,High,16,T2.16
T3.19,BillingView current plan credits remaining usage stats Stripe Checkout in system browser alerts (+0.5d T2.BB),Sergio,#1 Native Shell,Cycle 3,High,20,T2.20
T3.20,WRITE Confirmation Dialogs diff red/green Aceptar/Rechazar 35min timeout ConfirmationFlow T3.2,Sergio,#1 Native Shell,Cycle 3,Urgent,8,"T2.17,T3.2"
T3.21,Suggestion Cards + Tool Progress clickable cards pre-contextualized chat spinner tool name (+0.5d T2.BB),Sergio,#1 Native Shell,Cycle 3,High,12,"T2.17,T2.18"
T3.22,ProfileView connected marketplaces usage stats preferences language notifications,Sergio,#1 Native Shell,Cycle 3,Medium,8,T2.20
T3.23,Stripe Checkout + Customer Portal Pro $49/mo cancel update-payment checkout.session.completed 500cr,Sergio,#13 Billing,Cycle 3,Urgent,24,T3.19
T3.24,ICreditsGate + Credits Backend POST /internal/gate READ=1 ANALYSIS=2 WRITE=3 conditional write,Sergio,#13 Billing,Cycle 3,Urgent,16,T3.23
T3.24a,Billing Schema Migration ALTER TABLE + credit_packs subscription_events credit_transactions,Sergio,#13 Billing,Cycle 3,High,8,T3.23
T3.24b,SubscriptionLifecycleService activate cancel-7d-grace upgrade downgrade invoice.payment_failed,Sergio,#13 Billing,Cycle 3,High,8,T3.23
T3.24c,Monthly Credit Reset Cron EventBridge Lambda 1st of month plan credits pack credits 12mo TTL,Sergio,#13 Billing,Cycle 3,Medium,4,T3.24
T3.25,KB BigQuery Indexing 15-20 docs pipeline Go verify top-5 semantic search 5 test queries,Mateo,#9 Cerebro KB,Cycle 3,High,8,"T1.22,T1.23"
T3.MK1,Mockup BillingView: integrate CreditEconomy + billing components from T2.BB,Sergio,#1 Native Shell,Cycle 3,Medium,4,"T2.BB,T3.19"
T3.MK2,Mockup ProfileView: integrate profile components from T2.BB,Sergio,#1 Native Shell,Cycle 3,Medium,4,"T2.BB,T3.22"
T3.MK3,Mockup ConfirmDialog: integrate ConfirmDialog organism from T2.BB in WRITE flow,Sergio,#1 Native Shell,Cycle 3,Medium,4,"T2.BB,T3.20"
T3.BB,Advanced Organisms + [LIB] Pattern Components complete delivery,UX/UI,#18 Design System,Cycle 3,Urgent,40,T2.BB
T3.32,Token Pipeline + Style Dictionary extract design-tokens.json via MCP -> CSS :root + tailwind.config.ts validate naming modes,Mateo,#18 Design System,Cycle 3,High,16,T0.BB
T3.26,Eval Phase 2 CI Integration eval-on-pr.yml PR blocked if !passed auto-comment <10 min 20 cases,Pablo,#16 Eval Suite,Cycle 3,High,16,T2.24
T3.27,Golden Dataset 50 Cases 15 product 10 pricing 8 WRITE 7 proactive 10 edge cases injection,Pablo,#16 Eval Suite,Cycle 3,High,24,T2.24
T3.28,QA Conversation Flows 3 Marketplaces real Sellerfy data all flows issues -> Linear Beautonomous,Pablo,#16 Eval Suite,Cycle 3,High,16,"T3.1,T3.3"
T4.1,WebSocket Streaming replace REST 8 server->client events 4 client->server session restore,Mateo,#2 Orchestrator,Cycle 4,Urgent,16,"T1.7,T3.3"
T4.2,SystemPromptComposer L3 writeCapable=true block guardrails injected conditionally hard cap 1200tok,Mateo,#4 Personality,Cycle 4,High,8,"T1.5,T3.3"
T4.3,OutputGuard post-LLM data leak prevention dangerous content filter critical alert on leak,Mateo,#7 Guardrails,Cycle 4,High,8,T3.5
T4.4,WRITE Tools Remaining up to 13 additional: images video stock close publish answer hide send,Mateo,#3 Tool Registry,Cycle 4,High,24,T3.3
T4.5,Performance Optimization p95 <3s context compaction cache hit parallelization profile bottlenecks,Mateo,#2 Orchestrator,Cycle 4,High,16,T4.1
T4.5a,FeedbackCapture Hook after_tool -> POST /feedback/capture DynamoDB #15 WRITE success fire-forget,Mateo,#2 Orchestrator,Cycle 4,Medium,8,"T2.4,T4.13"
T4.5b,ActionLog Entity + DynamoActionLogRepository pk User#{userId} sk Action#{ULID} GSI Conv#{convId},Mateo,#2 Orchestrator,Cycle 4,Medium,8,"T2.4,T3.3"
T4.6,Staging Deploy Full Stack AWS: CDK Lambda API-GW DynamoDB ElastiCache RDS + Terraform GCP CR BQ,Andres,#14 DevOps,Cycle 4,Urgent,24,"T2.15,T3.18"
T4.7,Load Testing 50 Concurrent Users Artillery/k6 p95 <2s API excl LLM latency identify bottlenecks,Andres,#14 DevOps,Cycle 4,Urgent,16,T4.6
T4.8,CloudWatch Dashboard + Alerts latency error-rate LLM-cost/conv PagerDuty Slack cost/day $50,Andres,#14 DevOps,Cycle 4,High,16,T4.6
T4.9,Data Sync Silver + Gold INormalizer SilverNormalizer DailySummaryAggregator Brand Health spike,Andres,#10 Data Sync,Cycle 4,High,24,T3.13
T4.9a,API Gateway v2 WebSocket CDK $connect $disconnect $default DynamoDB connection-ids Lambda auth,Andres,#14 DevOps,Cycle 4,Urgent,8,T4.6
T4.10,WebSocket Client Progressive Rendering 8 server->client events text_delta tool_start suggestion (+0.5d T3.BB),Sergio,#1 Native Shell,Cycle 4,Urgent,20,"T2.18,T4.1"
T4.11,EnrollmentView Standalone BrowserWindow OAuth redirect per marketplace reuses T2.21 tokens,Sergio,#1 Native Shell,Cycle 4,High,8,T2.21
T4.12,Sentry Crash Reporting main + renderer source maps error grouping crash feedback dialog,Sergio,#1 Native Shell,Cycle 4,Medium,4,T1.16
T4.13,Feedback Loop Scaffold package.json IFeedbackRepository IFeedbackGate IDataSyncClient models,Sergio,#15 Feedback Loop,Cycle 4,High,8,T0.8
T4.14,calculateImpactScore + DynamoFeedbackRepository sales*0.4 conversion*0.3 findPendingEntries GSI1,Sergio,#15 Feedback Loop,Cycle 4,High,16,T4.13
T4.15,FeedbackMeasurerService + Lambdas processEntries>7d EventBridge 6h /feedback/:userId/summary,Sergio,#15 Feedback Loop,Cycle 4,High,16,T4.14
T4.15a,FeedbackGate Anti-Fatigue max 1 explicit/day <3 interactions skip 24h cooldown should-prompt,Sergio,#15 Feedback Loop,Cycle 4,Medium,4,T4.13
T4.15b,Explicit Feedback Endpoint POST /feedback/:userId/explicit rating 1-5 comment conversationId,Sergio,#15 Feedback Loop,Cycle 4,Medium,4,T4.14
T4.15c,Implicit Feedback Endpoint POST /feedback/:userId/implicit accepted/rejected/edited originalValue,Sergio,#15 Feedback Loop,Cycle 4,Medium,4,T4.14
T4.15d,Grace Period 7d Billing customer.subscription.deleted -> grace_period_end cron check expired,Sergio,#13 Billing,Cycle 4,High,4,T3.24b
T4.MK1,Mockup EnrollmentView: integrate EnrollmentCard organism from T3.BB,Sergio,#1 Native Shell,Cycle 4,Medium,4,"T3.BB,T4.11"
T4.MK2,Mockup complete WRITE flow: integrate ReActStream + ConfirmDialog + ToolAccordion in real WS flow,Sergio,#1 Native Shell,Cycle 4,Medium,8,"T3.BB,T4.10"
T4.BB,Figma Quality Audit + Corrections: All frames Ready for development,UX/UI,#18 Design System,Cycle 4,High,24,T3.BB
T4.24,Gate 2 signed build .dmg notarized + .exe,Sergio,#1 Native Shell,Cycle 4,High,4,T2.40
T4.16,KB Batch v2 Vertex AI 250/call incremental if >5min >80% hit rate 20 eval queries,Mateo,#9 Cerebro KB,Cycle 4,High,16,"T2.22,T2.23"
T4.17,Eval CI Automated GitHub Action 50 golden cases every push CI fails if score <0.70 Slack notify,Pablo,#16 Eval Suite,Cycle 4,Urgent,16,"T3.26,T3.27"
T4.18,Test ProactiveSuggestions Real Data verify triggers message quality dedup max 2/turn iterate prompt,Pablo,#16 Eval Suite,Cycle 4,High,16,T3.4
T4.19,Beta User Selection + Onboarding Prep 10-15 Sellerfy sellers video walkthrough 1-on-1 30min,Pablo,#17 Beautonomous,Cycle 4,High,16,T2.21
T4.19a,Eval Contract Testing Pipeline consumer-driven ToolRegistry->DataSync->MP->Enrichment contracts,Pablo,#16 Eval Suite,Cycle 4,High,16,T3.26
T4.19b,KB Quality Eval Pipeline precision@5 recall hit rate 20 queries CI fails if hit-rate <80%,Pablo,#16 Eval Suite,Cycle 4,High,8,T4.16
T5.1,LLMGuardChecker Haiku classifier for ambiguous injection edge cases fallback pass-through,Mateo,#7 Guardrails,Cycle 5,High,8,"T3.5,T4.3"
T5.2,Backend Bug Fixes P1/P2 empty results LLM refuses tool concurrent WRITE expired tokens mid-conv,Mateo,#2 Orchestrator,Cycle 5,Urgent,32,T4.1
T5.3,System Prompt v3 Final beta feedback tone issues tool selection patterns edge cases,Mateo,#4 Personality,Cycle 5,High,8,T5.10
T5.4,Production Deploy CDK Lambda+API-GW prod Terraform Cloud Run Data API prod SSL api.shopilot.ai,Andres,#14 DevOps,Cycle 5,Urgent,24,T4.6
T5.5,IaC Production Complete DynamoDB PITR 35d IAM Lambda concurrency GCS lifecycle Redis PG backups,Andres,#14 DevOps,Cycle 5,High,16,T5.4
T5.6,Rollback Testing Lambda version <1min Cloud Run revision <1min document runbook,Andres,#14 DevOps,Cycle 5,High,8,T5.4
T5.6a,Data Sync Phase 4 OpenMetadata FQNs embed_fast_dag embed_health_dag Bronze->Gold->KB lineage,Andres,#10 Data Sync,Cycle 5,Medium,16,T4.9
T5.7,Code Signing + .dmg + Auto-Updater Apple cert notarytool stapling S3 releases test clean Mac,Sergio,#1 Native Shell,Cycle 5,Urgent,16,T4.12
T5.8,Electron Security Hardening CSP sandbox nodeIntegration=false webSecurity=true telemetry opt-out,Sergio,#1 Native Shell,Cycle 5,High,8,T5.7
T5.9,UI/UX Beta Bug Fixes all feedback bugs RAM <500MB target polish animations loading states (+0.5d post-audit T4.BB),Sergio,#1 Native Shell,Cycle 5,Urgent,28,T4.19
T5.10,Billing Stripe Live switch test->live checkout webhooks credits packs SSL billing endpoints,Sergio,#13 Billing,Cycle 5,Urgent,8,"T3.23,T5.4"
T5.MK1,Mockup Dashboard view: integrate final post-audit components in a dashboard-style view,Sergio,#1 Native Shell,Cycle 5,Medium,8,"T4.BB,T5.9"
T5.11,Beta Onboarding 10-15 Sellers .dmg -> marketplace -> query -> action 1-on-1 30min activation,Pablo,#17 Beautonomous,Cycle 5,Urgent,24,"T5.7,T5.4"
T5.12,Feedback Calls + Iteration 15min each user top 5 issues -> Linear via Beautonomous,Pablo,#17 Beautonomous,Cycle 5,High,16,T5.11
T5.13,OWASP Top 10 Security Review injection auth XSS SSRF data exposure fix P1s,Pablo,#16 Eval Suite,Cycle 5,High,8,"T5.4,T5.7"
T5.14,System Prompt v2 Beautonomous 10-week iteration update governance index docs OpenClaw KB,Pablo,#17 Beautonomous,Cycle 5,Medium,8,
T5.15,Go/No-Go 60min sync 4 engineers checklist: tools Stripe 10+ beta .dmg OWASP p95 eval. Pablo signs,Pablo,#17 Beautonomous,Cycle 5,Urgent,4,"T5.1,T5.2,T5.3,T5.4,T5.5,T5.6,T5.7,T5.8,T5.9,T5.10,T5.11,T5.12,T5.13,T5.14"
T5.15a,E2E Eval Pipeline query->tool-selection->execution->response 10+ scenarios different from LLM Judge,Pablo,#16 Eval Suite,Cycle 5,High,16,T4.17
T3.40,Extend EvalConfig+CLI add desktop_build figma_quality pipelineType DesktopBuildReport FigmaQualityReport CLI flags,Pablo,#16 Eval Suite,Cycle 3,High,8,T2.24
T3.41,FigmaRESTClient IFigmaAPIClient getFile getFileVariables getFileComponents getFileStyles FIGMA_ACCESS_TOKEN,Pablo,#16 Eval Suite,Cycle 3,High,12,T3.40
T3.42,FigmaQualityRunner + variable checks variable_architecture code_syntax semantic_aliasing light_dark_modes,Pablo,#16 Eval Suite,Cycle 3,High,16,T3.41
T3.43,Component checks auto_layout naming_convention states_coverage color_hardcoding spacing_hardcoding,Pablo,#16 Eval Suite,Cycle 3,High,16,T3.42
T3.44,Quality checks + report wcag_contrast descriptions mcp_compatibility compliance per file violations by severity,Pablo,#16 Eval Suite,Cycle 3,High,8,T3.42
T4.25,Code Signing Secrets macOS Developer-ID+notarization Windows Authenticode GitHub Secrets electron-builder verify,Pablo,#16 Eval Suite,Cycle 4,High,8,
T4.26,DesktopBuildRunner + core checks compilation code-signing notarization startup<5s bundle<250MB native-modules,Pablo,#16 Eval Suite,Cycle 4,High,24,T3.40
T4.27,Secondary checks auto-updater deep-links window-rendering IPC-channels ping/pong warnings not blockers,Pablo,#16 Eval Suite,Cycle 4,Medium,8,T4.26
T4.28,GitHub Actions desktop-build-eval.yml 3 jobs build-macos build-windows report PR comment trigger desktop-client,Pablo,#16 Eval Suite,Cycle 4,High,12,T4.26
T4.29,GitHub Actions figma-quality-eval.yml workflow_dispatch + cron weekly Monday 8UTC Slack #engineering report,Pablo,#16 Eval Suite,Cycle 4,Medium,4,"T3.42,T3.43,T3.44"

9.8Full Task Registry — Acceptance CriteriaRegistro Completo de Tareas — Criterios de Aceptación

Cross-reference with sprint tables in 9.11. 183 tasks across 6 phases (Phase 0 + S1-2 + S3-4 + S5-6 + S7-8 + S9-10) + S11-12 buffer. Each task has 2–4 measurable acceptance criteria. Source: 70-EXEC-BACKLOG-CORREGIDO.md v2.0 + 75-EVAL-EXTENSION-TAREAS.md. All tracked in Linear via Beautonomous (#17).Referencia cruzada con tablas de sprint en 9.11. 183 tareas en 6 fases + buffer S11-12. Cada tarea tiene 2–4 criterios de aceptación medibles. Fuente: 70-EXEC-BACKLOG-CORREGIDO.md v2.0 + 75-EVAL-EXTENSION-TAREAS.md. Todas trackeadas en Linear vía Beautonomous (#17).

Phase 0 — Pre-Sprint — T0.1–T0.11 (11 tasks) — #17 BeautonomousFase 0 — Pre-Sprint — T0.1–T0.11 (11 tareas) — #17 Beautonomous
T0.1Create OpenClaw ProjectPablo#17 • 30m
  • Project ‘Beautonomous’ exists in OpenClaw with type = operational agent
  • All 4 members (pablo@, mateo@, andres@, sergio@) can log in and see the project
  • Project is ready to receive OAuth connectors
T0.2Connect GitHub OAuthMateo#17 • 30mdeps: T0.1
  • GitHub org authorized; Beautonomous can read all 11 core-(capa)-proyecto repos
  • El Mago (Mateo) has full access; Artesano roles have limited read access per F2.1–F2.3
  • 4+ core repos visible and indexed in Beautonomous project context
T0.3Connect Linear OAuthPablo#17 • 30mdeps: T0.1
  • Workspace ‘beautonomous’, team AUT, linked with read+write for issues/cycles/members
  • Beautonomous can create and update issues in Linear programmatically
  • Team AUT visible in Beautonomous project context
T0.4Connect Slack OAuthMateo#17 • 30mdeps: T0.1
  • #engineering, #deploys, #general channels connected with read+write
  • Beautonomous posts test message to #engineering successfully
  • All 3 channels return message history when queried
T0.5System Prompt v1 BeautonomousPablo#17 • 4hdeps: T0.1
  • Prompt identifies as Beautonomous (NOT as Shopilot Coach — distinct identity)
  • Roles defined: Capitán (Pablo), Mago (Mateo), Artesano (Andrés/Sergio) with correct permissions
  • 6 governance rules present and testable
  • Test: ‘quién eres?’ → responds as Beautonomous, not as Coach
T0.6Configure Role MappingPablo#17 • 1hdeps: T0.5
  • pablo → Capitán, mateo → Mago, andres/sergio → Artesano verified in config
  • Each role has correct permission set per spec F2.1–F2.3
  • Role-based access test: Artesano cannot perform Capitán-only actions
T0.7Create Linear StructurePablo#17 • 2hdeps: T0.3
  • 17 projects visible in Linear team AUT
  • 6 cycles created (2 weeks each, incl. S11-12 buffer)
  • Labels exist: L/M/S, Track-{Mateo,Andrés,Sergio,Pablo}, Risk-{low,medium,high}
  • Workflow states present: Backlog → Todo → In Progress → In Review → Done
T0.8ValidationLos 4#17 • 1hdeps: T0.2–T0.6
  • All 4 engineers complete 3 queries each (12 total: GitHub read, Linear task create, code read)
  • Role permissions verified: each engineer sees only their allowed actions
  • Beautonomous checkpoint: operational — Linear task created to mark CORE live
T0.9Apple Developer Program enrollment + Developer ID certInscripción Apple Developer Program + certificado Developer IDPablo#14 • 1h
  • Apple Developer account created and approved ($99/yr paid)Cuenta Apple Developer creada y aprobada ($99/año pagado)
  • Developer ID Application certificate generated and downloadedCertificado Developer ID Application generado y descargado
  • Certificate installed in team keychain, codesign --verify passesCertificado instalado en keychain del equipo, codesign --verify pasa
T0.10Windows code signing certificate procurementObtención de certificado code signing WindowsPablo#14 • 1h
  • Code signing certificate (OV or EV) purchased from trusted CACertificado code signing (OV o EV) comprado de CA confiable
  • Certificate installed and signtool verify passes on WindowsCertificado instalado y signtool verify pasa en Windows
  • If EV: hardware token received and configuredSi EV: token de hardware recibido y configurado
T0.11Brand Book delivery from external design teamEntrega de Brand Book del equipo de diseño externoPablo#18 • 1d
  • Brand book requested from external design team following core-product-design-system repo guidelinesBrand book solicitado al equipo de diseño externo siguiendo las guías del repo core-product-design-system
  • Visual identity deliverable received: logo, colors, typography, usage rulesEntregable de identidad visual recibido: logo, colores, tipografía, reglas de uso
  • Brand book assets accessible to UX/UI team for T0.BB (Figma foundations delivery)Assets del brand book accesibles para equipo UX/UI para T0.BB (entrega Figma foundations)
T0.BBFigma Foundations Delivery: Brand book + [LIB] Foundations & Tokens + [LIB] Iconography + [LIB] Core Components partial (Button, Icon, StatusDot, Spinner, Divider, TabBar)Entrega Figma Foundations: Brand book + [LIB] Foundations & Tokens + [LIB] Iconography + [LIB] Core Components parcial (Button, Icon, StatusDot, Spinner, Divider, TabBar)UX/UI#18 • 4ddeps: T0.11
  • Pablo approves delivery, tokens exported to design-tokens.jsonPablo aprueba entrega, tokens exportados a design-tokens.json
  • Components match brand book visual identityComponentes coinciden con identidad visual del brand book
  • [LIB] Foundations & Tokens + [LIB] Iconography published in Figma[LIB] Foundations & Tokens + [LIB] Iconography publicados en Figma
Sprints 1-2 — Walking Skeleton — T1.1–T1.33 (32 tasks)Sprints 1-2 — Walking Skeleton — T1.1–T1.33 (32 tareas)

Mateo — #2 Orchestrator · #4 Personality · #5 Context · #8 Observability · #9 Cerebro KB — 12 tasks

T1.1DynamoDB Fix (Fase -1)Mateo#2 • 3ddeps: T0.8
  • IDs are ULID format (verified in DynamoDB items — no UUID format present)
  • Trace SK = Trace#{messageId}; queryEmbedding/answer fields absent from Trace
  • GSI1 projection = INCLUDE (not ALL); GSI2 is sparse — verified in CDK stack code
  • Unit tests for KeyBuilders pass; CDK deploy completes without errors
T1.2UserProfile EntityMateo#2 • 1ddeps: T1.1
  • pk: User#{userId}, sk: Profile — correct keys in DynamoDB
  • Fields present: userId, marketplaces[], productCategories[], declaredGoals[], lastUpdatedAt
  • DynamoUserProfileRepository.save() + findById() unit tests pass
T1.3Conversation History in PromptMateo#2 • 2ddeps: T1.1
  • findWindowForPrompt(convId, windowSize) returns correct last-N messages
  • Token budget respected: history block + system + rag + response ≤ 200K total
  • Test with 100-message conversation: correct window loaded, no overflow
T1.4ILLMClient UpdateMateo#2 • 2d
  • chat() signature accepts toolDefinitions? and thinkingBudget?
  • Return type is { content: ContentBlock[], stopReason } (not { content: string })
  • All 3 clients (OpenRouter, Anthropic, Vertex) compile and pass updated interface
  • Tool response block parsed correctly from ContentBlock[] in unit test
T1.5SystemPromptComposer L1+L2Mateo#4 • 2ddeps: T1.2
  • L1 block has cache_control: { type: “ephemeral” } and ~500 tokens
  • L2 block includes UserProfile data and critical alerts when present
  • compose(context) returns { blocks[], estimatedTokens }
  • estimatedTokens ≤ 1000 for typical context
T1.6AgentLoopOrchestrator (ReAct)Mateo#2 • 3ddeps: T1.3, T1.4, T1.5
  • Loop runs Reason → Act → Observe cycle for multi-turn conversations
  • Stops at MAX_ROUNDS=10 even without done signal; cost guard triggers at 50K tokens
  • Context budget = 200K − system − history − tools − 4000 enforced
  • Tool error → tool_result with is_error: true, loop continues
T1.7RestResponseEventEmitterMateo#2 • 4hdeps: T1.6
  • POST /conversation returns complete response after all ReAct rounds finish (no streaming yet)
  • Full content assembled from all text ContentBlocks
  • Internal events logged: round count, tools used, cost per turn
T1.8Verify Observability with ReActMateo#8 • 1ddeps: T1.6
  • ConversationTrace includes tool_calls[] with tool name, args, and result per round
  • round_count field populated correctly in trace
  • Existing trace tests still pass (no regressions from ReAct changes)

Andrés — #12 Marketplace Provider · #10 Data Sync · #14 DevOps — 11 tasks

T1.9Scaffold Marketplace ProviderAndres#12 • 1ddeps: T0.8
  • npm run build passes with 0 TypeScript errors
  • Folders: domain/, application/, infrastructure/, presentation/ all present
  • Value Objects (Marketplace, SKU, MarketplaceCredential) instantiate with validation
  • DI container wires all dependencies without circular errors
T1.10IMarketplaceAdapter InterfaceAndres#12 • 4hdeps: T1.9
  • Interface declares all 23 methods across 4 domains (Catalog, Engagement, Advertising, Enrollment)
  • ISKUResolver.resolve(sku, marketplace) returns native marketplace ID
  • TypeScript compiles with 0 errors
T1.11AES256GCMCipher + ITokenManagerAndres#12 • 2ddeps: T1.9
  • AES-256-GCM encrypt/decrypt round-trip unit test passes
  • DynamoDB table marketplace-credentials created in CDK with correct schema
  • ITokenManager.get(sellerId, marketplace) auto-refreshes when remaining time < 15min
  • Token never stored in plaintext in DynamoDB (verified by direct table scan)
T1.12MeLiOAuth2Flow + MeLiAdapterAndres#12 • 3ddeps: T1.10, T1.11
  • OAuth2 code flow completes end-to-end for test MeLi developer account
  • GET /users/me/items returns real seller items with correct mapping
  • MeLi 403 → AuthenticationError; 429 → RateLimitError (mapped correctly)
  • Code reused from context/marketplace-connection/ (no duplication of logic)
T1.13AmazonLWAFlow + AmazonAdapter scaffoldAndres#12 • 2ddeps: T1.10, T1.11
  • LWA OAuth flow scaffolded (or stubbed pending E1 approval — status documented in Linear)
  • SP-API SDK initialized with correct region config
  • Rate limiting config per API family defined (even if not yet active)
  • All methods return NotImplementedError with descriptive TODO comment
T1.14Verify Terraform GCPAndres#14 • 1ddeps: T0.8
  • terraform plan returns 0 changes on existing resources
  • All GCS buckets, Cloud Run services, Airflow, BigQuery in terraform state list are green
  • Airflow health check endpoint returns OK
T1.15Request External Dependencies (E1–E5)Andres#14 • 4hdeps: T0.8
  • Amazon SP-API dev account requested (email/ticket link documented in Linear on day 1)
  • MeLi dev portal app created; Shopify Partners account created
  • Apple Developer Program enrollment submitted
  • Linear tasks created per dependency with expected approval date and owner
T1.15aSellerConnection AggregateAndres#12 • 1ddeps: T1.9
  • Valid state transitions work: disconnected→pending, pending→active, active→expired/revoked
  • Invalid transition throws DomainError (e.g., disconnected→active not allowed)
  • Persisted to DynamoDB and rehydrated correctly
  • sellerConnection.isActive() returns true only in active state
T1.15bMarketplaceAction Entity + RepoAndres#12 • 4hdeps: T1.9
  • Entity has all fields: actionId, sellerId, marketplace, method, status, requestPayload, responsePayload, latencyMs
  • IMarketplaceActionRepository.save() and findByIdAndSeller() pass unit tests
  • latencyMs measured from call start to response completion
T1.15cIOAuth2Flow Interface (domain port)Andres#12 • 4hdeps: T1.9
  • Interface declares authorize(), exchangeCode(), refreshToken()
  • MeLiOAuth2Flow implements interface (TypeScript compiles)
  • AmazonLWAFlow and ShopifyOAuth2Flow stubs implement interface without TypeScript errors
T1.28Collect Missing WRITE API DocsRecolectar Docs APIs WRITE FaltantesAndres#12 • 3ddeps: T0.8
  • MeLi: 3 WRITE API docs collected (endpoints, auth, payloads, error codes)MeLi: 3 docs API WRITE recolectados (endpoints, auth, payloads, códigos error)
  • Amazon Ads: 5 WRITE API docs collected with campaign management endpointsAmazon Ads: 5 docs API WRITE recolectados con endpoints gestión campañas
  • Amazon SP-API: 2 WRITE API docs collectedAmazon SP-API: 2 docs API WRITE recolectados
  • Shopify: 9 WRITE API docs collected (GraphQL mutations documented)Shopify: 9 docs API WRITE recolectados (mutaciones GraphQL documentadas)
  • All docs organized per marketplace in shared repo, ready for #3 Tool Registry mappingTodos los docs organizados por marketplace en repo compartido, listos para mapeo de #3 Tool Registry
T1.29Collect User Management Provider DocsRecolectar Docs Gestor de UsuariosAndres#12 • 2ddeps: T0.8
  • At least 2 external auth providers evaluated (Auth0, Clerk, Memberstack) with pros/cons matrixAl menos 2 proveedores auth externos evaluados (Auth0, Clerk, Memberstack) con matriz pros/contras
  • Service methods documented: authentication, authorization, credential management, user CRUDMétodos de servicio documentados: autenticación, autorización, gestión credenciales, CRUD usuarios
  • Integration points with consumer layers identified (Shell, API, Billing)Puntos de integración con capas consumidoras identificados (Shell, API, Billing)
T1.33GitHub Actions CI: electron-builder pipelineGitHub Actions CI: pipeline electron-builderAndres#14 • 0.5ddeps: T1.32
  • GitHub Actions workflow triggers on push to release/* branchesWorkflow de GitHub Actions se activa en push a ramas release/*
  • Builds .dmg + .exe artifacts, uploads to GitHub Releases as draftGenera artefactos .dmg + .exe, los sube a GitHub Releases como borrador
  • Build completes in <10min, artifacts downloadable by teamBuild completa en <10min, artefactos descargables por el equipo

Sergio — #1 Native Shell — 6 tasks

T1.16Scaffold Electron + electron-builderSergio#1 • 1ddeps: T0.8
  • npm run dev starts Electron app without errors
  • Main process, preload script, and renderer all load correctly
  • Hot reload: file change → app refreshes in <3s
  • npm run build produces distributable artifact
T1.17MainWindow + WebContentsViewSergio#1 • 2ddeps: T1.16
  • WebContentsView (NOT BrowserView — deprecated in Electron 26+) loads MeLi URL correctly
  • Navigation controls (back/forward/refresh) function as expected
  • Marketplace session persists after app restart (electron-store verified)
  • Main window is ~70% of screen width
T1.18MarketplaceDetectorSergio#1 • 1ddeps: T1.17
  • articulo.mercadolibre.com/MLM-xxx → detected as MeLi product page with item ID
  • amazon.com/dp/BXXXXX → detected as Amazon product page with ASIN
  • Remote config JSON fetched on startup; fallback to local patterns if fetch fails
  • Unknown URL → returns null (no crash)
T1.19Tab System + Sidebar ContainerSergio#1 • 2.5d(+0.5d setup tokens T0.BB)deps: T1.17
  • Marketplace tabs render above WebContentsView and are clickable
  • Sidebar React panel renders at 360px on right side
  • IPC: renderer sends request, main responds with data (round-trip verified)
  • Cmd+B toggles sidebar visibility
T1.20Auth MemberstackSergio#1 • 1ddeps: T1.16
  • JWT stored in electron-store with OS-level encryption key (not plaintext)
  • Login flow: Memberstack login → JWT stored → app navigates to /chat
  • Logout clears JWT from electron-store completely
  • AuthService.isAuthenticated() returns correct value on app start
T1.32First .dmg + .exe canary build (unsigned)Primer build canary .dmg + .exe (sin firmar)Sergio#1 • 1ddeps: T1.16
  • electron-builder produces .dmg (macOS) and .exe (Windows) without errorselectron-builder produce .dmg (macOS) y .exe (Windows) sin errores
  • .dmg installs on macOS, .exe installs on Windows — app launches and shows shell.dmg instala en macOS, .exe instala en Windows — app lanza y muestra shell
  • All 4 team members install and verify basic functionalityLos 4 miembros del equipo instalan y verifican funcionalidad básica
T1.MK1Mockup shell container: validate Figma tokens/components in real Electron+React contextMockup shell container: validar tokens/componentes Figma en contexto real Electron+ReactSergio#1 • 0.5ddeps: T0.BB, T1.19
  • Tokens render correctly in React, component integration verifiedTokens renderizan correctamente en React, integración de componentes verificada
  • Design system CSS variables match Figma token valuesVariables CSS del design system coinciden con valores de tokens Figma

UX/UI — #18 Design System — 1 task

T1.BBAtoms + Molecules Base + Chat Organisms: Atoms remaining + AI-native Atoms + Molecules base + Organisms chat (MessageBubble, ChatInputBar, ContextBar, OnboardingStep)Atoms + Molecules Base + Organismos Chat: Atoms restantes + Atoms AI-native + Molecules base + Organismos chat (MessageBubble, ChatInputBar, ContextBar, OnboardingStep)UX/UI#18 • 6ddeps: T0.BB
  • All atoms published in [LIB] Core ComponentsTodos los atoms publicados en [LIB] Core Components
  • Chat organisms ready for dev: MessageBubble, ChatInputBar, ContextBar, OnboardingStepOrganismos de chat listos para dev: MessageBubble, ChatInputBar, ContextBar, OnboardingStep
  • Pablo approves deliveryPablo aprueba entrega

Pablo — #16 Eval Suite — 1 tasks

T1.21KB Phase 0 — Fix DuplicatesMateo#9 • 2d
  • SELECT COUNT(*) on embeddings matches unique doc count (no duplicates)
  • embedded_at column has real timestamps (not null or epoch 0)
  • CI pipeline uses Go 1.24 (not 1.21)
  • created_at, updated_at, embedded_at are 3 separate fields in schema
T1.22KB Phase 1 — Contextual RetrievalMateo#9 • 2ddeps: T1.21
  • Each stored chunk has [namespace/type] title prefix in text
  • Chunking splits on ## and ### Markdown headers
  • Adjacent chunks share 150-char overlap (verified in chunk boundary test)
  • Retrieve chunk → prefix is present in stored text (not stripped)
T1.23KB Content: 15–20 Curated DocsMateo#9 • 5d
  • 15+ documents indexed in BigQuery (verified by row count)
  • Documents cover: MeLi best practices, Amazon policies, Shopify guides, pricing strategy, photo optimization, KPIs, FAQ
  • Each document has: title, source URL, language metadata
  • SELECT COUNT(*) FROM embeddings WHERE is_current=true ≥ 15
T1.24Eval Phase 0 — Setup + Golden DatasetPablo#16 • 3d
  • npm run eval -- --dry-run executes without crashing
  • IEvalPipeline, ILLMJudge, IGoldenDatasetManager interfaces exist in domain/
  • 15+ YAML golden cases in dataset/ directory, each with: input, expected_scope, expected_tool, min_judge_score
  • EvalConfig and EvalReport models compile correctly
T1.2510 READ Tool SpecsMateo#9 • 2d
  • Spec document exists for all 10 READ tools (get_product, get_product_metrics, get_orders, get_buyer_questions, get_product_reviews, get_category_requirements, get_campaigns, get_campaign_metrics, get_store, get_store_metrics)
  • Each spec has: name, LLM-friendly description, valid JSON Schema inputSchema, riskLevel=READ, creditCost=1
  • Specs importable by TypeScript ToolRegistry without type errors
T1.26Brand Registration in MarketplacesRegistro de Marca ante MarketplacesPablo#10 • 3ddeps: T0.8
  • Amazon Brand Registry application submitted with required documentationSolicitud Amazon Brand Registry enviada con documentación requerida
  • Amazon Ads developer account application submittedSolicitud cuenta developer Amazon Ads enviada
  • MercadoLibre developer portal registration completedRegistro en portal developer MercadoLibre completado
  • Shopify Partner Program application submittedSolicitud Shopify Partner Program enviada
  • Weekly tracking issue created in Linear with status per marketplaceIssue de seguimiento semanal creado en Linear con estado por marketplace
T1.27Authorize App in Apple & Windows StoresAutorizar App en Apple & Windows StorePablo#1 • 2ddeps: T0.8
  • Apple Developer Program enrollment completed ($99/yr paid)Inscripción Apple Developer Program completada ($99/año pagado)
  • Code signing certificate requested and ready for useCertificado code signing solicitado y listo para uso
  • Microsoft Partner Center account created and verifiedCuenta Microsoft Partner Center creada y verificada
  • App recognized as verified publisher on both platforms (or application in progress with tracking)App reconocida como publisher verificado en ambas plataformas (o solicitud en progreso con seguimiento)
Sprints 3-4 — Core Engines — T2.1–T2.40 (38 tasks)Sprints 3-4 — Motores Core — T2.1–T2.40 (38 tareas)

Mateo — #3 Tool Registry · #5 Context · #2 Orchestrator · #9 Cerebro KB — 15 tasks

T2.1ToolRegistry + ToolDefinitionMateo#3 • 2ddeps: T1.6, T1.25
  • registry.register(def, handler) stores and retrieves handler by name
  • registry.getDefinitions() returns array with all required fields per ToolDefinition schema
  • Zod schema validation: valid input passes, invalid input throws ZodError
  • TypeScript compiles with 0 errors; unit tests cover register, getDefinitions, getHandler
T2.2IToolExecutor + ToolExecutorMateo#3 • 1ddeps: T2.1
  • executor.execute(‘get_product’, { sku: ‘MLB-123’ }, ctx) returns ToolResult
  • Unknown tool name → ToolResult with isError: true
  • ToolExecutor routes to the registered handler without hardcoded names
T2.3ToolPolicyFilterMateo#3 • 1ddeps: T2.1
  • Irreversible tool (update_price) → ConfirmationRequiredError thrown before execution
  • Tool for unconnected marketplace → MarketplaceNotConfiguredError thrown
  • READ tool on connected marketplace → passes through without confirmation
  • Filter added via DI; ToolExecutor code not modified
T2.4HookLifecycleMateo#3 • 1ddeps: T2.2
  • before_tool hook executes before handler call (verified via execution log order)
  • after_tool hook executes even when handler throws exception
  • Hook order: before → execute → after (always, no skip on error)
  • Multiple hooks can be registered for same lifecycle event
T2.510 READ Tool Handlers (stubs)Mateo#3 • 2ddeps: T2.1
  • All 10 READ handlers registered in ToolRegistry
  • Each handler returns hardcoded mock ToolResult with correct field shape
  • execute(‘get_product’, { sku: ‘MLB-123’ }) returns product mock data without error
  • Files organized in handlers/read/ directory, one file per tool
T2.5aToolResult Domain ModelMateo#3 • 4hdeps: T2.1
  • ToolResult is immutable (Object.freeze or all fields readonly)
  • Fields present: toolName, args, result, isError, latencyMs, cached, creditCost
  • Instantiation unit test: all fields set correctly, mutation attempt throws
T2.5bupdate_user_profile SYSTEM ToolMateo#3 • 4hdeps: T2.1, T1.2
  • Tool registered as SYSTEM category in ToolRegistry
  • { marketplaces: [‘meli’] } input → updates UserProfile.marketplaces in DynamoDB
  • LLM can invoke tool via tool_call block (correct name and schema in definitions)
  • UserProfile retrieved after update reflects new values
T2.5ccontextSummaryMateo#5 • 1ddeps: T1.3
  • When conversation.contextSummary is set, it replaces full history in prompt
  • contextSummarizedUpToMessageId tracks last summarized message ID
  • Test: 50-message conversation at threshold → summary generated, token count drops below limit
  • contextSummary stored correctly in DynamoDB Conversation entity
T2.5d17 WRITE Tool StubsMateo#3 • 4hdeps: T2.1
  • All 17 WRITE tools registered with ConfirmationRequired policy in ToolRegistry
  • Each stub returns NotImplementedError (does not call any marketplace)
  • getDefinitions() includes all 17 WRITE tools — LLM can “see” them
  • S12 scope tracking: all 13 deferred WRITE tools have Linear tasks
T2.6IContextAssemblerMateo#5 • 2ddeps: T1.6
  • KB RAG and Brand Health RAG execute in parallel (verified via Promise.all or timing logs)
  • Single embedding call used for both lookups
  • KB failure → assembler returns Brand Health context alone (no exception propagated)
  • Brand Health failure → assembler returns KB context alone (no exception propagated)
T2.7Health Summary StructuredMateo#5 • 1ddeps: T2.6
  • getHealthSummary(userId, ‘meli’) returns { critique: [], delicate: [] }
  • Result always injected into system prompt (not optional)
  • Empty result → empty arrays (not null/undefined, no crash)
T2.8Prompt Caching AnthropicMateo#2 • 1ddeps: T1.5
  • SystemPromptBlock JSON payload has cache_control: { type: “ephemeral” }
  • Only AnthropicLLMClient sends cache_control; OpenRouter/Vertex skip it
  • Second identical call: Anthropic API response shows cache_read_input_tokens > 0
T2.9Tool Result Caching In-MemoryMateo#3 • 4hdeps: T2.2
  • Same tool+args in same session → second call returns cached ToolResult with cached: true
  • Different session → fresh execution (cache is per-session, not global)
  • WRITE and ANALYSIS tools with side effects are not cached (only READ)

Andrés — #12 Marketplace Provider · #10 Data Sync · #14 DevOps — 10 tasks

T2.10ShopifyOAuth2Flow + ShopifyAdapterAndres#12 • 3ddeps: T1.10, T1.11
  • OAuth2 flow completes with real Shopify Partner test store
  • GraphQL query for products returns correct data shape
  • Rate limit exceeded → throttle respected (backoff logged)
  • Products, orders, inventory queries all return expected shape
T2.11AmazonAdapter Complete (if E1 approved)Andres#12 • 3ddeps: T1.13
  • If E1 approved: Reports API, Catalog Items, Orders API all return real data
  • Rate limit 5 req/s enforced with exponential backoff on 429
  • If E1 not approved: defer confirmed, Linear task updated with ‘blocked:E1’ label
T2.12TokenRefreshCronAndres#12 • 1ddeps: T1.11, T1.12
  • EventBridge rule fires every 5 minutes (confirmed in CloudWatch logs)
  • Token refreshed when remaining time < 30min
  • DynamoDB mutex prevents concurrent refresh for same seller+marketplace (race test passes)
  • 3 consecutive failures → alert posted to #deploys Slack
T2.13Data Sync Clean Architecture APIAndres#10 • 2ddeps: T0.8
  • All API endpoints behave identically before and after refactor (integration tests pass)
  • IDataReader, ITokenProvider interfaces exist in domain/ with correct signatures
  • VOs: UserId, Marketplace, DateRange validate and reject invalid inputs
  • npm run test passes with 0 regressions
T2.14Verify Existing DAGsAndres#10 • 1ddeps: T2.13
  • MeLi and Shopify DAGs both show status=Success for last @hourly run in Airflow UI
  • Bronze schema for both marketplaces matches expected columns (no null PKs)
  • Zero failed DAG runs in last 24h
T2.15CDK Base AWSAndres#14 • 2ddeps: T1.1
  • cdk deploy completes with 0 errors
  • DynamoDB conversation-api table exists with corrected GSIs (ULID keys, INCLUDE projection)
  • Lambda + API Gateway v2 HTTP endpoint returns 200 on GET /health
  • marketplace-credentials table, Secrets Manager secret, EventBridge rule all provisioned
T2.16GitHub Actions CI Multi-RepoAndres#14 • 1d
  • CI workflow file exists in all 4 active repos
  • PR → CI runs lint + type-check + unit tests; failing test blocks merge
  • Build cache via actions/cache reduces CI runtime by >30% on cache hit
T2.16amarketplace-actions DynamoDB TableAndres#14 • 4hdeps: T2.15
  • Table marketplace-actions created via CDK
  • pk = sellerId (string), sk = actionId (ULID) — format verified
  • GSI: marketplace-status-index with pk=marketplace, sk=status
T2.16bAmazonAdsOAuth2Flow (dual OAuth)Andres#12 • 1ddeps: T1.13
  • Separate OAuth2 flow for Amazon Ads API (distinct from LWA SP-API flow)
  • Credentials stored separately in Secrets Manager (not mixed with SP-API credentials)
  • authorize() → exchangeCode() flow completes with Amazon Ads test account
T2.16cISKUResolver ImplementationsAndres#12 • 1ddeps: T1.10
  • MeLiSKUResolver: internal SKU “ML-123” ↔ MeLi item ID “123”
  • AmazonSKUResolver: internal SKU “ASIN-B0xxx” ↔ ASIN “B0xxx”
  • ShopifySKUResolver: internal SKU “shop-456” ↔ numeric product ID “456”

Sergio — #1 Native Shell — 6 tasks

T2.17Chat UI + Markdown RenderingSergio#1 • 2.5d(+0.5d integration T1.BB)deps: T1.19
  • User messages appear in right bubble; assistant messages in left bubble
  • Markdown bold, italic, code, lists render correctly in assistant messages
  • Indicators visible: ‘pensando...’ while waiting; ‘ejecutando tool X...’ during tool call
  • Syntax highlighting active in code blocks
T2.18CoachWebSocketServiceSergio#1 • 1ddeps: T1.7
  • WebSocket connects to conversation-api server successfully
  • On disconnect: reconnect with backoff 1s, 2s, 4s... max 30s
  • Heartbeat ping/pong every 30s (verified in network inspector)
  • If WebSocket unavailable: REST polling every 2s activates automatically
T2.19URL→Metadata InjectionSergio#1 • 1ddeps: T1.18
  • MeLi product URL → metadata includes { marketplace: ‘meli’, pageType: ‘product’, productId: ‘MLM-xxx’ }
  • Metadata appended to message payload on every send
  • Works for Amazon ASIN URLs and Shopify product URLs
T2.20react-router View NavigationSergio#1 • 1ddeps: T2.17
  • /chat route loads as default; /profile, /billing, /enrollment, /onboarding all load without crash
  • Tab bar visible at bottom, highlights active route
  • Chat state (messages, input text) preserved when navigating to /profile and back
T2.21OnboardingWizard 5 StepsSergio#1 • 2.5d(+0.5d T1.BB)deps: T2.17, T1.12
  • Step 1 (Welcome) shown only on first launch (localStorage flag set after)
  • Step 2: OAuth inline popup opens correctly and connects marketplace
  • Step 3: Profile form saves to backend successfully
  • Step 4: Guided query executes at least one tool successfully
  • Step 5: Success screen with next steps shown; skip available from step 3
T2.40Gate 1 signed buildBuild firmado Gate 1Sergio#1 • 1ddeps: T0.9, T0.10, T1.32
  • macOS: .dmg signed with Developer ID cert, notarized with notarytool, stapledmacOS: .dmg firmado con certificado Developer ID, notarizado con notarytool, stapled
  • Windows: .exe signed with code signing cert, SmartScreen doesn’t blockWindows: .exe firmado con certificado code signing, SmartScreen no bloquea
  • Both builds include all S3-4 features (chat UI, WebSocket, context injection)Ambos builds incluyen todas las funciones S3-4 (chat UI, WebSocket, inyección de contexto)
  • All 4 team members install signed builds, no security warningsLos 4 miembros del equipo instalan builds firmados, sin advertencias de seguridad
T2.MK1Mockup ChatView: integrate chat organisms from T1.BB in real ChatView componentMockup ChatView: integrar organismos de chat de T1.BB en componente ChatView realSergio#1 • 1ddeps: T1.BB, T2.17
  • MessageBubble, ChatInputBar render correctly with real dataMessageBubble, ChatInputBar renderizan correctamente con datos reales
  • Chat organisms match Figma specifications pixel-perfectOrganismos de chat coinciden con especificaciones Figma pixel-perfect
T2.MK2Mockup OnboardingWizard: integrate onboarding components from T1.BBMockup OnboardingWizard: integrar componentes de onboarding de T1.BBSergio#1 • 0.5ddeps: T1.BB, T2.21
  • OnboardingStep component works in 5-step wizard flowComponente OnboardingStep funciona en flujo de wizard de 5 pasos
  • Step transitions and animations match Figma prototypeTransiciones y animaciones de pasos coinciden con prototipo Figma

UX/UI — #18 Design System — 1 task

T2.BBMolecules + Data/Flow Organisms: Molecules remaining + Organisms data & flows (ConfirmDialog, ToolAccordion, ProactiveCard, MarketplaceKPI, CreditEconomy, EnrollmentCard)Molecules + Organismos Datos/Flujos: Molecules restantes + Organismos datos y flujos (ConfirmDialog, ToolAccordion, ProactiveCard, MarketplaceKPI, CreditEconomy, EnrollmentCard)UX/UI#18 • 6ddeps: T1.BB
  • All molecules published in [LIB] Core ComponentsTodas las molecules publicadas en [LIB] Core Components
  • Data organisms ready for dev: ConfirmDialog, ToolAccordion, ProactiveCard, MarketplaceKPI, CreditEconomy, EnrollmentCardOrganismos de datos listos para dev: ConfirmDialog, ToolAccordion, ProactiveCard, MarketplaceKPI, CreditEconomy, EnrollmentCard
  • Pablo approves deliveryPablo aprueba entrega

Pablo — #16 Eval Suite · #17 Beautonomous — 4 tasks

T2.22KB Phase 2 — Incremental ProcessingMateo#9 • 2ddeps: T1.21
  • SHA-256 hash stored per document in BigQuery
  • Re-run pipeline with unchanged doc → skipped (no re-embed); changed doc → re-embedded
  • is_current = false set for outdated embeddings
  • Test: modify 1 doc → only 1 doc re-embedded (verified via embed count in logs)
T2.23KB Phase 3 — Batch EmbeddingsMateo#9 • 2ddeps: T1.21
  • 250 texts sent per Vertex AI API call (verified in API logs)
  • 429 response → retry with backoff (3 attempts max logged)
  • Goroutine pool: max 5 concurrent API calls (semaphore verified)
  • Performance: ~6000 individual texts → ~24 batch calls (10× speed improvement)
T2.24Eval Phase 1 — LLM Judge + EvalRunnerPablo#16 • 3ddeps: T1.24
  • npx ts-node eval.ts runs EvalRunner end-to-end without crash
  • AnthropicLLMJudge scores each response 0.0–1.0 with justification text
  • YamlDatasetLoader loads all YAML files from dataset/ directory
  • ReportGenerator produces JSON report with per-case pass/fail; 20+ cases all evaluated
T2.25E2E Testing via PlaygroundPablo#16 • 2ddeps: T1.6, T2.1
  • 3 conversation flows tested with real Sellerfy data
  • At least 1 READ tool call and 1 proactive suggestion verified in test
  • Issues found documented as Linear tasks via Beautonomous
  • 0 unhandled exceptions during test session
T2.26Bootstrap ~150 Tasks in LinearPablo#17 • 4hdeps: T0.7
  • All ~150 tasks from backlog created in Linear with correct cycle assignment
  • Each task has L/M/S label, Track-{engineer} label, and dependencies linked
  • 4 engineers confirm task list in Linear before S1 starts
T2.26aQuality Gate 5-Step BeautonomousPablo#17 • 1ddeps: T0.5
  • Pipeline configured: structure → lint → tests → architecture review → convention check
  • Gate runs automatically when PR submitted via OpenClaw
  • Failed step blocks PR approval; result posted as comment on PR
Sprints 5-6 — WRITE Tools + Billing + Enrichment — T3.1–T3.44 (41 tasks)Sprints 5-6 — WRITE Tools + Billing + Enrichment — T3.1–T3.44 (41 tareas)

Mateo — #3 Tools · #6 Proactive · #7 Guard · #11 Enrichment · #2 Orch · #9 Cerebro KB — 14 tasks

T3.110 READ Handlers RealMateo#3 • 3ddeps: T2.5, T2.13
  • Each handler calls correct Fast Data Layer endpoint: GET /data/{user_id}/fast/{tool}
  • Zod schema validates FDL API response before returning ToolResult
  • p95 latency < 500ms for all 10 handlers (verified in load test)
  • FDL unavailable → fallback to direct Marketplace Provider call (fallback tested)
T3.2ConfirmationFlowMateo#2 • 2ddeps: T1.6, T2.3
  • WRITE tool triggered → execution paused; tool_result not yet sent to LLM
  • Diff shown to user: { before: {...}, after: {...} }
  • Aceptar → execution resumes and completes; Rechazar → cancelled, no marketplace change
  • Timeout 35min: OrchestrationSession TTL expires in DynamoDB
  • Concurrent confirmations tracked independently by sessionId
T3.34 WRITE Tool HandlersMateo#3 • 3ddeps: T3.2
  • update_product_content: GCS snapshot stored before write; content updated in MeLi
  • update_price: irreversible flag → confirmation always required even if user said ‘just do it’
  • pause_product / activate_product: listing status changes verified via get_product after write
  • All 4: verify() confirms change applied; ActionLog entry written
T3.4ProactiveSuggestionServiceMateo#6 • 2ddeps: T2.4
  • after_tool hook invokes service after every tool execution
  • LLM evaluates result and returns { hasSuggestion: bool, message, priority }
  • Max 2 suggestions per turn enforced (3rd suggestion dropped)
  • Dedup: same suggestion not shown if offered in last 7 days (UserProfile.recentSuggestions checked)
  • No hardcoded if/else rules — pure LLM evaluation with no hand-crafted triggers
T3.5IGuardService + InputGuardMateo#7 • 1ddeps: T1.6
  • “Ignore previous instructions” → detected as injection, guard error returned
  • Off-scope query (weather, sports) → polite refusal, no tool call
  • Guard exception → request passes through with warning logged (degradation graceful)
  • InputGuard injected via DI; not hardcoded in AgentLoopOrchestrator
T3.5aHttpCreditGate in conversation-apiMateo#2 • 1ddeps: T3.24
  • Before each tool: POST /internal/gate called with toolCategory and sellerId
  • READ=1 credit, ANALYSIS=2 credits, WRITE=3 credits deducted per execution
  • /internal/gate returns 503 → fail-open: tool executes anyway, event logged
  • Seller with 0 credits → gate returns 402, tool blocked with ‘insufficient credits’ message
T3.6Enrichment Scaffold + InterfacesMateo#11 • 1ddeps: T0.8
  • npm run build passes with 0 errors
  • IEnrichmentService, IMarketIntelligenceAdapter, IContentAnalysisAdapter, IEnrichmentCache in domain/
  • EnrichmentContainer.wire() resolves all dependencies without circular errors
  • All interfaces compile with correct method signatures
T3.7MeliMarketIntelligenceAdapterMateo#11 • 2ddeps: T3.6
  • search_market_products(‘auriculares bluetooth’) returns list of competitor products
  • get_competitor_product(‘MLB-xxx’) returns product with price, title, photos
  • get_market_pricing(‘MLB-xxx’) returns PriceDistribution: min, max, median, avg
  • No seller credentials required (MeLi public API only)
T3.8VisionLLMContentAdapterMateo#11 • 1ddeps: T3.6
  • analyze_product_image(url) returns ImageAnalysisResult with quality score and improvement suggestions
  • analyze_product_video(url) returns VideoAnalysisResult
  • enhance_product_image(url) → throws NotImplementedError with descriptive message (MVP scope)
T3.9RedisEnrichmentCache + EnrichmentServiceMateo#11 • 1ddeps: T3.7, T3.8
  • Cache TTL: pricing=30min, competitor=1h, search=15min, image=24h
  • Second call with same args → Redis cache hit; no adapter called
  • Adapter throws exception → EnrichmentResult returned with error field (not re-thrown to caller)
  • Redis unavailable → service falls back to direct adapter call
T3.10Enrichment CDK StackMateo#11 • 1ddeps: T3.9
  • cdk deploy enrichment-stack completes without errors
  • Lambda function enrichment-api created and reachable
  • GET /health on API Gateway returns 200
  • ElastiCache Redis cluster reachable from Lambda (VPC subnet verified)
T3.118 ANALYSIS Tool HandlersMateo#3 • 2ddeps: T3.9
  • All 8 ANALYSIS handlers registered in ToolRegistry with creditCost=2
  • search_market_products handler calls IEnrichmentService and returns correct ToolResult shape
  • enhance_product_image handler returns graceful NotImplemented ToolResult (isError=false, result has message)
T3.12HallucinationCheckerMateo#2 • 1ddeps: T3.1
  • LLM says “comisión del 10%” but tool_result has 15% → mismatch logged with claim, actual, toolName, conversationId
  • Response is NOT blocked (Phase 1 = log only)
  • Checker exception → response passes through unchanged

Andrés — #10 Data Sync · #12 Marketplace Provider · #14 DevOps — 6 tasks

T3.13Fast Data Layer — 11 EndpointsAndres#10 • 3ddeps: T2.13
  • All 11 endpoints respond to GET /data/{user_id}/fast/{tool}
  • Response time p95 < 500ms (verified in k6/Locust test)
  • Reads GCS Parquet directly via pyarrow (no Redis intermediary)
  • Invalid user_id → 404; unknown tool → 422
T3.14GCS Snapshots for ConfirmationFlowAndres#10 • 1ddeps: T3.13
  • Before WRITE: snapshot stored at gs://shopilot-snapshots/{user_id}/{tool}/{ts}.parquet
  • GET /data/{user_id}/snapshot/{tool}/{ts} returns the pre-write state correctly
  • snapshot_cleanup_dag runs daily and deletes snapshots older than 7 days
T3.15DAG AmazonAndres#10 • 3ddeps: T2.14, T2.11
  • Amazon DAG runs with status=Success in Airflow (or uses test fixture if E1 pending)
  • AmazonExtractor fetches data from SP-API; AmazonLoader writes to GCS Bronze
  • Bronze schema for Amazon matches expected columns
  • MeLi and Shopify Bronze schemas still validate correctly (no regressions)
T3.16IRateLimiter per MarketplaceAndres#12 • 1ddeps: T1.12
  • MeLi: 1501st request in 1-min window → 429 with retry-after header
  • Amazon: burst limit enforced; restore rate configured per API family
  • Shopify: leaky bucket cost-points respected
  • Redis counter incremented per request; verified via unit test with mock Redis
T3.17Onboarding TriggerAndres#12 • 1ddeps: T1.12, T2.14
  • Seller connects first marketplace → onboarding_initial_sync DAG triggered in Airflow
  • DAG runs successfully with seller’s credentials
  • Seller’s data visible in Fast Data Layer within 5 minutes of connection
T3.18CI/CD Multi-Repo CompleteAndres#14 • 2ddeps: T2.16
  • All 11 repos have GitHub Actions workflow file
  • PR to any repo → CI runs lint + test + build
  • Merge to main → staging auto-deploy triggered (confirmed in deploy logs)
  • Secrets stored in GitHub Org Secrets (no .env files committed)

Sergio — #1 Native Shell · #13 Billing — 9 tasks

T3.19BillingViewSergio#1 • 2.5d(+0.5d T2.BB)deps: T2.20
  • Plan name (Free/Pro) and credits remaining displayed correctly from API
  • Usage stats (tools called today, this week) visible
  • ‘Upgrade’ button opens Stripe Checkout in system browser (not in Electron WebContentsView)
  • Low credits alert (<10 credits) shows banner in sidebar
T3.20WRITE Confirmation DialogsSergio#1 • 1ddeps: T2.17, T3.2
  • WRITE tool triggers confirmation modal automatically (no manual trigger needed)
  • Modal shows diff: current value in red, new value in green
  • Aceptar → sends confirmation to backend; Rechazar → cancels with no change
  • 35min timer visible; reminder at 5min remaining
T3.21Suggestion Cards + Tool ProgressSergio#1 • 1.5d(+0.5d T2.BB)deps: T2.17, T2.18
  • Suggestion card appears in sidebar after tool execution when service returns hasSuggestion=true
  • Click on card → new conversation opens with pre-loaded context
  • Tool execution shows spinner with text ‘ejecutando get_product...’
  • Spinner dismisses when tool_result event received
T3.22ProfileViewSergio#1 • 1ddeps: T2.20
  • Connected marketplaces listed with status (e.g., ‘MeLi ✓’, ‘Amazon ×’)
  • Total credits used and tools called visible
  • Language setting (EN/ES) toggle persists after app restart
  • Default marketplace selection persists across sessions
T3.23Stripe Checkout + Customer PortalSergio#13 • 3ddeps: T3.19
  • ‘Upgrade’ click → Stripe Checkout opens in system browser for Pro $49/mo
  • After payment: webhook checkout.session.completed received → 500 credits granted
  • Customer Portal: seller can cancel subscription without contacting support
  • Webhook customer.subscription.deleted triggers cancellation in backend
T3.24ICreditsGate + Credits BackendSergio#13 • 2ddeps: T3.23
  • POST /internal/gate with READ tool → deducts exactly 1 credit (DynamoDB conditional write)
  • Race condition test: 10 concurrent gate calls → exactly 10 credits deducted (no double-spend)
  • Seller with 0 credits → gate returns 402; tool blocked with ‘insufficient credits’ message
  • Billing service down → fail-open: tool executes anyway (logged)
T3.24aBilling Schema MigrationSergio#13 • 1ddeps: T3.23
  • Migration script runs idempotently (run twice → same result, no errors)
  • clients table has Stripe fields: stripe_customer_id, stripe_subscription_id, plan
  • credit_packs, subscription_events, credit_transactions tables exist with correct schema
  • Migration verified in staging database before production
T3.24bSubscriptionLifecycleServiceSergio#13 • 1ddeps: T3.23
  • activate(): plan updated, credits granted, event logged in subscription_events
  • cancel(): grace_period_end = now+7d set; plan stays active until grace expires
  • invoice.payment_failed webhook → seller notification sent
  • All events logged in subscription_events table with timestamp and type
T3.24cMonthly Credit Reset CronSergio#13 • 4hdeps: T3.24
  • EventBridge rule fires on 1st of each month (CloudWatch confirms execution)
  • Plan credits reset: Free→50, Pro→500 each 1st of month
  • Pack credits NOT reset (pack expiry = 12 months from purchase)
  • credit_transactions log entry created for each reset (type=plan_reset)
T3.32Token pipeline + Style DictionaryMateo#18 • 2ddeps: T0.BB
  • design-tokens.json extracted from Figma variables via MCP
  • Style Dictionary configured, generates CSS :root variables + tailwind.config.ts
  • Tokens integrated in core-product-desktop-client — zero hardcoded values in config
T3.MK1Mockup BillingView: integrate CreditEconomy + billing components from T2.BBMockup BillingView: integrar CreditEconomy + componentes de billing de T2.BBSergio#1 • 0.5ddeps: T2.BB, T3.19
  • CreditEconomy component renders correctly with real billing dataComponente CreditEconomy renderiza correctamente con datos reales de billing
  • Billing view matches Figma design specificationsVista de billing coincide con especificaciones de diseño Figma
T3.MK2Mockup ProfileView: integrate profile components from T2.BBMockup ProfileView: integrar componentes de perfil de T2.BBSergio#1 • 0.5ddeps: T2.BB, T3.22
  • Profile view components render correctly with real user dataComponentes de vista de perfil renderizan correctamente con datos reales de usuario
  • Profile layout matches Figma specificationsLayout de perfil coincide con especificaciones Figma
T3.MK3Mockup ConfirmDialog: integrate ConfirmDialog organism from T2.BB in WRITE flowMockup ConfirmDialog: integrar organismo ConfirmDialog de T2.BB en flujo WRITESergio#1 • 0.5ddeps: T2.BB, T3.20
  • ConfirmDialog organism renders correctly in WRITE confirmation flowOrganismo ConfirmDialog renderiza correctamente en flujo de confirmación WRITE
  • Diff visualization (current/new values) matches Figma designVisualización de diff (valores actual/nuevo) coincide con diseño Figma

UX/UI — #18 Design System — 1 task

T3.BBAdvanced Organisms + Pattern Library: ReActStream, DataTable, AuditLog, RollbackPanel, FraudAlert, ErrorRecovery A/B/C. Publish [LIB] Pattern Components completeOrganismos Avanzados + Librería Patterns: ReActStream, DataTable, AuditLog, RollbackPanel, FraudAlert, ErrorRecovery A/B/C. Publicar [LIB] Pattern Components completoUX/UI#18 • 5ddeps: T2.BB
  • Pattern library published complete in [LIB] Pattern ComponentsLibrería de patterns publicada completa en [LIB] Pattern Components
  • All frames ready for developmentTodos los frames listos para desarrollo
  • Pablo approves deliveryPablo aprueba entrega

Pablo — #16 Eval Suite — 8 tasks

T3.25KB BigQuery IndexingMateo#9 • 1ddeps: T1.22, T1.23
  • SELECT COUNT(*) FROM embeddings WHERE is_current=true ≥ 15 in BigQuery
  • Top-5 semantic search for ‘comisión Mercado Libre’ returns relevant chunks
  • Go pipeline completes without errors; 5 test queries all return meaningful results (human-reviewed)
T3.26Eval CI IntegrationPablo#16 • 2ddeps: T2.24
  • eval-on-pr.yml workflow exists in core-intelligence-conversation-api repo
  • PR → CI evaluates Coach on staging; PR blocked with auto-comment if score < 0.70
  • Eval completes in < 10 minutes for 20–30 cases
  • Baseline updated on merge to main
T3.27Golden Dataset 50 CasesPablo#16 • 3ddeps: T2.24
  • 50+ YAML cases in dataset/ directory
  • Distribution: 15 product, 10 pricing, 8 WRITE confirm, 7 proactive, 10 edge cases
  • Edge cases include: injection attempt, off-scope query, empty tool result, ambiguous intent
  • Each case has min_judge_score defined
T3.28QA Conversation Flows (3 Marketplaces)Pablo#16 • 2ddeps: T3.1, T3.3
  • End-to-end test with real MeLi seller data: query → tool call → response verified
  • End-to-end test with real Shopify data: same flow verified
  • All issues found documented as Linear tasks via Beautonomous
T3.40Extend EvalConfig + CLI: desktop_build and figma_quality pipeline typesExtender EvalConfig + CLI: tipos de pipeline desktop_build y figma_qualityPablo#16 • 1ddeps: T2.24
  • EvalConfig.pipelineType accepts desktop_build and figma_quality values without errorEvalConfig.pipelineType acepta valores desktop_build y figma_quality sin error
  • CLI eval run --pipeline figma_quality executes without crash and prints resultsCLI eval run --pipeline figma_quality ejecuta sin error e imprime resultados
  • Existing LLM Judge pipeline unaffected (regression test passes)Pipeline LLM Judge existente sin regresión (test pasa)
T3.41FigmaRESTClient: getFile, getFileVariables, getFileComponents, getFileStylesFigmaRESTClient: getFile, getFileVariables, getFileComponents, getFileStylesPablo#16 • 1.5ddeps: T3.40
  • All 4 Figma REST endpoints return parsed responses for a real Figma fileLos 4 endpoints REST de Figma devuelven respuestas parseadas para un archivo real
  • Figma API token injected via env var FIGMA_TOKEN; missing token raises clear errorToken Figma inyectado via FIGMA_TOKEN; token faltante genera error claro
  • Rate limit exceeded → client retries with exponential backoff (max 3 attempts)Rate limit excedido → cliente reintenta con backoff exponencial (máx 3 intentos)
T3.42FigmaQualityRunner + variable checksFigmaQualityRunner + checks de variablesPablo#16 • 2ddeps: T3.41
  • Runner executes against a Figma file and returns a structured QualityReportRunner ejecuta contra un archivo Figma y devuelve un QualityReport estructurado
  • Variable checks: token usage, missing bindings, and orphan variables detected correctlyChecks de variables: uso de tokens, bindings faltantes y variables huérfanas detectados
  • Report includes per-check pass/fail status and violation countReporte incluye estado pass/fail por check y conteo de violaciones
T3.43Component checks: auto_layout, naming_convention, states_coverage, color_hardcoding, spacing_hardcodingChecks de componentes: auto_layout, naming_convention, states_coverage, color_hardcoding, spacing_hardcodingPablo#16 • 2ddeps: T3.42
  • All 5 component checks fire and report violations for a test Figma file with known issuesLos 5 checks de componentes ejecutan y reportan violaciones en un archivo Figma de prueba
  • Clean Figma file returns 0 violations for all 5 checksArchivo Figma limpio devuelve 0 violaciones en los 5 checks
  • Each violation includes component name, frame path, and remediation hintCada violación incluye nombre de componente, ruta de frame y pista de remediación
T3.44Quality checks + report: wcag_contrast, descriptions, mcp_compatibilityChecks de calidad + reporte: wcag_contrast, descriptions, mcp_compatibilityPablo#16 • 1ddeps: T3.42
  • WCAG contrast check flags text layers with contrast ratio < 4.5:1 (AA)Check WCAG contraste detecta capas de texto con ratio < 4.5:1 (AA)
  • Descriptions check flags components missing description fieldsCheck de descriptions detecta componentes sin campo de descripción
  • Final report is valid JSON with summary.score field (0.0–1.0); CI fails if score < 0.80Reporte final es JSON válido con campo summary.score (0.0–1.0); CI falla si score < 0.80
Sprints 7-8 — Hardening + Staging — T4.1–T4.29 (38 tasks)Sprints 7-8 — Hardening + Staging — T4.1–T4.29 (38 tareas)

Mateo — #2 Orchestrator · #4 Personality · #7 Guard · #3 Tools · #9 Cerebro KB — 8 tasks

T4.1WebSocket StreamingMateo#2 • 2ddeps: T1.7, T3.3
  • Client connects to wss://api-staging.shopilot.ai/ws successfully
  • Server emits all 8 events: thinking, tool_start, tool_result, text_delta, suggestion, confirmation_required, error, done
  • text_delta events render progressively in UI (words appear as they arrive)
  • 4 client→server events work: send_message, confirm_action, reject_action, cancel
  • Session restores after disconnect: in-progress conversation recoverable
T4.2SystemPromptComposer L3Mateo#4 • 1ddeps: T1.5, T3.3
  • writeCapable=true → WRITE guardrail block injected into system prompt
  • writeCapable=false → WRITE block absent from prompt
  • Total system prompt ≤ 1200 tokens (verified by estimatedTokens field)
  • WRITE guardrail text includes irreversibility warning
T4.3OutputGuardMateo#7 • 1ddeps: T3.5
  • Response containing another user’s data → alert logged and response sanitized
  • Response with dangerous instructions → blocked and replaced with safety message
  • Guard exception → response passes through with error logged (not swallowed silently)
  • Data leak detection → critical alert sent to #engineering Slack
T4.4WRITE Tools Remaining (circuit breaker)Mateo#3 • 3ddeps: T3.3
  • At least update_product_images and update_stock handlers implemented and working
  • Circuit breaker applied: remaining deferred WRITE tools tracked in Linear as S12 scope
  • Implemented tools pass confirmation flow end-to-end
  • S12 deferred tools have Linear tasks with ‘circuit-breaker’ label
T4.5Performance OptimizationMateo#2 • 2ddeps: T4.1
  • p95 response time ≤ 3000ms measured in Artillery test (excluding LLM time)
  • Context compaction reduces token count by >20% for conversations > 30 messages
  • Prompt cache hit rate ≥ 80% (Anthropic API metrics)
  • At least 1 bottleneck identified, fixed, and documented in Linear
T4.5aFeedbackCapture HookMateo#2 • 1ddeps: T2.4, T4.13
  • after_tool hook fires for every successful WRITE execution
  • POST /feedback/capture called with sellerId, toolName, conversationId, actionId
  • Fire-and-forget: hook does not block tool response (async, not awaited in critical path)
  • Failed POST logged but does not affect tool result or user experience
T4.5bActionLog Entity + DynamoActionLogRepositoryMateo#2 • 1ddeps: T2.4, T3.3
  • DynamoActionLogRepository.save(actionLog) writes to DynamoDB correctly
  • pk = User#{userId}, sk = Action#{ULID} — format verified in table
  • findByConversation(convId) uses GSI1 Conv#{convId} and returns correct entries
  • ActionLog created for every WRITE tool executed (verified in integration test)

Andrés — #14 DevOps · #10 Data Sync — 5 tasks

T4.6Staging Deploy Full StackAndres#14 • 3ddeps: T2.15, T3.18
  • CDK deploy completes: Lambda, API GW v2, DynamoDB, ElastiCache Redis, RDS PostgreSQL, Secrets Manager
  • Terraform apply completes: Cloud Run (Data API), BigQuery, GCS, Airflow on GCP
  • GET https://api-staging.shopilot.ai/health → 200 OK
  • All health checks green in monitoring dashboard
T4.7Load Testing 50 UsersAndres#14 • 2ddeps: T4.6
  • Artillery/k6 simulates 50 concurrent users for 10 minutes
  • p95 API response time < 2000ms (excluding LLM latency, measured separately)
  • 0 5xx errors under sustained load
  • Bottleneck report identifies which component is limiting (Redis / DynamoDB / API GW)
T4.8CloudWatch Dashboard + AlertsAndres#14 • 2ddeps: T4.6
  • Dashboard shows: API latency p50/p95, error rate %, LLM cost/conversation, tool execution count, credits deducted
  • PagerDuty alert fires when p95 > 2s (tested with synthetic traffic spike)
  • Slack alert fires in #deploys when daily LLM cost > $50
  • All alerts have runbook links attached
T4.9Data Sync Silver + Gold (circuit breaker)Andres#10 • 3ddeps: T3.13
  • SilverNormalizer transforms Bronze → Silver for at least MeLi dataset
  • transform_to_silver_dag runs without errors in Airflow
  • If circuit breaker applied: Gold + Brand Health deferred to S12 with Linear tasks created
  • IBrandHealthCalculator interface defined even if implementation deferred
T4.9aAPI GW v2 WebSocket in CDKAndres#14 • 1ddeps: T4.6
  • cdk deploy creates WebSocket API with $connect/$disconnect/$default routes
  • DynamoDB connection-ids table populated on $connect, cleaned on $disconnect
  • Lambda authorizer validates JWT before allowing connection
  • IAM policies grant Lambda correct permissions to post to connections

Sergio — #1 Native Shell · #15 Feedback Loop · #13 Billing — 11 tasks

T4.10WebSocket Client ProgressiveSergio#1 • 2.5d(+0.5d T3.BB)deps: T2.18, T4.1
  • text_delta events render text progressively in chat bubble (words appear as streamed)
  • tool_start event shows spinner with tool name
  • suggestion event renders suggestion card in sidebar
  • confirmation_required event triggers WRITE confirmation modal
  • Reconnection attempt with backoff on WebSocket disconnect
T4.11EnrollmentView StandaloneSergio#1 • 1ddeps: T2.21
  • BrowserWindow opens for OAuth redirect URL (not WebContentsView)
  • After OAuth: auth code extracted from redirect URL
  • Auth code sent to backend via IPC; window closes automatically on success
  • WebContentsView is never navigated to OAuth URL during the flow
  • Step 5: Success screen shown; skip available from step 3 onwards
T4.12Sentry Crash ReportingSergio#1 • 4hdeps: T1.16
  • Main process crash → Sentry event created with full stack trace
  • Renderer crash → Sentry event created with source map resolved (not minified)
  • Source maps uploaded during build (not committed to repo)
  • Crash dialog with ‘Send Report’ button appears on unhandled error
T4.13Feedback Loop ScaffoldSergio#15 • 1ddeps: T0.8
  • npm run build passes with 0 TypeScript errors
  • IFeedbackRepository, IFeedbackGate, IDataSyncClient interfaces exist in domain/
  • FeedbackEntry, ExplicitFeedbackEntry, ImplicitFeedbackEntry models compile
  • CDK stack skeleton created (Lambdas and tables defined, not yet deployed)
T4.14calculateImpactScore + DynamoFeedbackRepositorySergio#15 • 2ddeps: T4.13
  • calculateImpactScore({ sales:1, conversion:1, visits:1, position:1 }) returns correct value using formula (sales×0.4 + conversion×0.3 + visits×0.2 + position×−0.1)
  • DynamoFeedbackRepository.save(entry) writes to DynamoDB correctly
  • findPendingEntries() queries GSI1 status=pending and returns correct entries
  • findByUser(userId) returns all entries for user across feedback types
T4.15FeedbackMeasurerService + LambdasSergio#15 • 2ddeps: T4.14
  • processPendingEntries() finds entries > 7 days old and measures impact score
  • DataSyncClient fetches current metrics from Fast Data Layer endpoint
  • EventBridge triggers FeedbackMeasurer Lambda every 6h (CloudWatch logs confirm)
  • GET /feedback/{userId}/summary returns aggregated score; /history returns paginated entries
T4.15aFeedbackGate Anti-FatigueSergio#15 • 4hdeps: T4.13
  • GET /feedback/{userId}/should-prompt → false if user had explicit feedback prompt today
  • Returns false if session has < 3 interactions
  • 24h cooldown enforced after explicit feedback given
  • Returns true correctly for eligible users (no cooldown, 3+ interactions)
T4.15bExplicit Feedback EndpointSergio#15 • 4hdeps: T4.14
  • POST /feedback/{userId}/explicit with { rating: 5, conversationId } → 201 Created
  • Rating validation: 1–5 accepted; 0 or 6 → 422 Unprocessable
  • ExplicitFeedbackEntry stored in DynamoDB with timestamp and conversationId
T4.15cImplicit Feedback EndpointSergio#15 • 4hdeps: T4.14
  • POST /feedback/{userId}/implicit with { action: ‘accepted’, toolName } → 201 Created
  • action values accepted: accepted, rejected, edited; other values → 422
  • When action=edited: originalValue and editedValue both stored
  • ImplicitFeedbackEntry linked to conversationId correctly
T4.15dGrace Period 7d BillingSergio#13 • 4hdeps: T3.24b
  • customer.subscription.deleted webhook → grace_period_end = now + 7 days
  • Seller retains Pro access and Pro credit limits during 7-day grace window
  • Cron job (daily): finds sellers with grace_period_end expired → downgrades to Free
  • grace_period_end field visible in backend admin view
T4.24Gate 2 signed buildBuild firmado Gate 2Sergio#1 • 0.5ddeps: T2.40
  • Full feature build: all S1-8 features including WS streaming, WRITE confirmation, billingBuild completo: todas las funciones S1-8 incluyendo streaming WS, confirmación WRITE, billing
  • .dmg notarized + .exe signed, zero security warnings on clean install.dmg notarizado + .exe firmado, cero advertencias de seguridad en instalación limpia
  • Team smoke test passes: onboarding → chat → tool execution → billing viewSmoke test del equipo pasa: onboarding → chat → ejecución de tool → vista billing
T4.MK1Mockup EnrollmentView: integrate EnrollmentCard organism from T3.BBMockup EnrollmentView: integrar organismo EnrollmentCard de T3.BBSergio#1 • 0.5ddeps: T3.BB, T4.11
  • EnrollmentCard organism renders correctly in EnrollmentViewOrganismo EnrollmentCard renderiza correctamente en EnrollmentView
  • OAuth redirect flow works with integrated componentsFlujo de redirect OAuth funciona con componentes integrados
T4.MK2Mockup complete WRITE flow: integrate ReActStream + ConfirmDialog + ToolAccordion in real WS flowMockup flujo WRITE completo: integrar ReActStream + ConfirmDialog + ToolAccordion en flujo WS realSergio#1 • 1ddeps: T3.BB, T4.10
  • ReActStream, ConfirmDialog, ToolAccordion render correctly in real WebSocket flowReActStream, ConfirmDialog, ToolAccordion renderizan correctamente en flujo WebSocket real
  • Complete WRITE flow works end-to-end with Figma design system componentsFlujo WRITE completo funciona end-to-end con componentes del design system Figma

UX/UI — #18 Design System — 1 task

T4.BBFigma Quality Audit + Corrections: AutoLayout, variables, naming, states. All frames ‘Ready for development’Auditoría Calidad Figma + Correcciones: AutoLayout, variables, naming, estados. Todos los frames ‘Ready for development’UX/UI#18 • 3ddeps: T3.BB
  • Figma QA checklist passed: AutoLayout, variables, naming, states verifiedChecklist QA de Figma aprobado: AutoLayout, variables, naming, estados verificados
  • All frames marked ‘Ready for development’ in FigmaTodos los frames marcados ‘Ready for development’ en Figma
  • Pablo approves deliveryPablo aprueba entrega

Pablo — #16 Eval Suite · #17 Beautonomous — 10 tasks

T4.16KB Batch v2Mateo#9 • 2ddeps: T2.22, T2.23
  • Vertex AI batch calls: 250 texts per call verified in API logs
  • If pipeline duration > 5min: incremental mode activates automatically
  • Semantic search on 20 eval queries: expected chunk in top-5 results for >80%
T4.17Eval CI AutomatedPablo#16 • 2ddeps: T3.26, T3.27
  • GitHub Action eval-automated.yml triggers on every push to main
  • 50 golden cases evaluated per run; JSON report generated
  • CI fails if overall score < 0.70 OR any critical-tagged case fails
  • Eval results posted to #engineering Slack channel via Beautonomous
T4.18Testing Proactive SuggestionsPablo#16 • 2ddeps: T3.4
  • After get_product with degraded listing: ProactiveSuggestionService generates appropriate suggestion
  • After get_product with healthy listing: no suggestion generated
  • Dedup verified: same suggestion not offered twice within 7 days in test session
  • Prompt iteration: at least 1 improvement made based on test results
T4.19Beta User Selection + PrepPablo#17 • 2ddeps: T2.21
  • 10–15 Sellerfy sellers identified (mix: small/medium/large by GMV)
  • 2-minute video walkthrough recorded and shareable
  • Setup doc prepared: download link, steps, support contact
  • Feedback form created; at least 1 confirmed 1-on-1 call scheduled
T4.19aEval Contract Testing PipelinePablo#16 • 2ddeps: T3.26
  • Consumer-driven contract tests exist for ToolRegistry → Data Sync integration
  • ToolRegistry → Marketplace Provider and ToolRegistry → Enrichment contracts verified
  • Contract failure in any integration → CI blocked
T4.19bKB Quality Eval PipelinePablo#16 • 1ddeps: T4.16
  • precision@5 and hit rate computed for 20 eval queries with expected chunks
  • Hit rate > 80% → CI passes; ≤ 80% → CI fails with failing queries listed
  • Metric report posted as PR comment
T4.25Code signing secrets setupConfiguración de secrets de code signingPablo#16 • 1ddeps: T0.9, T0.10
  • Apple cert + provisioning profile stored as GitHub Actions encrypted secretsCertificado Apple + provisioning profile almacenados como secrets cifrados en GitHub Actions
  • Windows code signing cert stored as GitHub Actions encrypted secretCertificado de firma Windows almacenado como secret cifrado en GitHub Actions
  • Secrets documented in team runbook; no plaintext credentials in codebaseSecrets documentados en runbook del equipo; sin credenciales en texto plano en el código
T4.26DesktopBuildRunner + core checks: binary_exists, sign_valid, launch_okDesktopBuildRunner + checks core: binary_exists, sign_valid, launch_okPablo#16 • 3ddeps: T3.40, T4.25
  • Runner builds macOS .app and Windows .exe artifacts without errorRunner compila artefactos macOS .app y Windows .exe sin error
  • sign_valid: macOS codesign --verify and Windows signtool verify both passsign_valid: macOS codesign --verify y Windows signtool verify ambos pasan
  • launch_ok: app launches headlessly and exits with code 0 in CI environmentlaunch_ok: app lanza en modo headless y sale con código 0 en entorno CI
  • Structured BuildReport returned with per-check pass/fail and artifact pathsBuildReport estructurado devuelto con pass/fail por check y rutas de artefactos
T4.27Secondary desktop checks: bundle_size, update_channel, notarizationChecks secundarios desktop: bundle_size, update_channel, notarizationPablo#16 • 1ddeps: T4.26
  • bundle_size: .app ≤ 200 MB; check fails with clear message if exceededbundle_size: .app ≤ 200 MB; check falla con mensaje claro si se excede
  • update_channel: auto-update endpoint reachable and returns valid version manifestupdate_channel: endpoint de auto-update alcanzable y devuelve manifiesto de versión válido
  • macOS notarization check passes (Apple notary service confirms ticket)Check de notarización macOS pasa (servicio notary de Apple confirma ticket)
T4.28GitHub Actions: desktop-build-eval.yml workflowGitHub Actions: workflow desktop-build-eval.ymlPablo#16 • 1.5ddeps: T4.26
  • desktop-build-eval.yml workflow exists in core-product-desktop-client repoWorkflow desktop-build-eval.yml existe en repo core-product-desktop-client
  • PR → CI runs DesktopBuildRunner; PR blocked if any core check failsPR → CI ejecuta DesktopBuildRunner; PR bloqueado si algún check core falla
  • macOS (macos-latest) and Windows (windows-latest) runners both complete successfullyRunners macOS (macos-latest) y Windows (windows-latest) completan exitosamente
T4.29GitHub Actions: figma-quality-eval.yml workflowGitHub Actions: workflow figma-quality-eval.ymlPablo#16 • 0.5ddeps: T3.42, T3.43, T3.44
  • figma-quality-eval.yml workflow exists in core-product-design-system repoWorkflow figma-quality-eval.yml existe en repo core-product-design-system
  • Workflow triggers on schedule (daily) and on manual dispatchWorkflow se dispara en schedule (diario) y en dispatch manual
  • Quality report posted as workflow summary; Slack notification sent on score < 0.80Reporte de calidad publicado como resumen del workflow; notificación Slack enviada si score < 0.80
Sprints 9-10 — Launch — T5.1–T5.MK1 (18 tasks)Sprints 9-10 — Launch — T5.1–T5.MK1 (18 tareas)

Mateo — #7 Guard · #2 Orchestrator · #4 Personality — 3 tasks

T5.1LLMGuardCheckerMateo#7 • 1ddeps: T3.5, T4.3
  • Haiku classifier identifies injection attempt that passes regex pattern matching
  • Off-scope query (e.g., weather forecast) → Haiku returns out-of-scope classification
  • Haiku API error → input passes through with warning logged (fail-open)
  • Adds < 500ms p95 latency to request processing
T5.2Bug Fixes BackendMateo#2, #3 • 4ddeps: T4.1
  • All P1/P2 bugs from beta report closed in Linear
  • Empty tool result → LLM receives ‘no data available’ message, no crash
  • LLM refuses tool → AgentLoop re-prompts once then provides partial answer
  • Concurrent WRITE: second WRITE blocked until first confirmation resolved
  • Token expired mid-conversation → ITokenManager auto-refreshes; conversation continues
T5.3System Prompt v3 FinalMateo#4 • 1ddeps: T5.10
  • Tone issues from beta feedback addressed (verified by Pablo review + re-eval)
  • Tool selection pattern issues fixed (tested with 5 edge case prompts from beta)
  • Eval score with v3 prompt ≥ v2 score
  • v3 deployed to staging and tested by at least 2 beta users

Andrés — #14 DevOps · #10 Data Sync — 4 tasks

T5.4Deploy ProductionAndres#14 • 3ddeps: T4.6
  • cdk deploy --all succeeds on production AWS account
  • terraform apply succeeds on production GCP project
  • GET https://api.shopilot.ai/health → 200 OK
  • SSL certificate valid; HTTP → HTTPS redirect active
T5.5IaC Production CompleteAndres#14 • 2ddeps: T5.4
  • DynamoDB PITR enabled with 35-day recovery window
  • Lambda concurrency limits configured per function
  • IAM roles follow least-privilege principle (reviewed by Mateo)
  • Redis backup enabled; PostgreSQL daily backup confirmed in RDS console
T5.6Rollback TestingAndres#14 • 1ddeps: T5.4
  • Lambda version rollback via aws lambda update-alias completes in < 60s
  • Cloud Run revision rollback via gcloud run services update-traffic completes in < 60s
  • Runbook documented with step-by-step commands; tested successfully in staging
T5.6aData Sync Phase 4 — OpenMetadata + EmbeddingsAndres#10 • 2ddeps: T4.9
  • Amazon and Fast Data FQNs visible in OpenMetadata UI
  • embed_fast_dag runs without errors (Bronze → Cerebro KB embedding)
  • embed_health_dag runs without errors (Gold → KB embedding)
  • Data lineage visible: source → transformation → KB in OpenMetadata

Sergio — #1 Native Shell · #13 Billing — 4 tasks

T5.7Code Signing + .dmg + Auto-UpdaterSergio#1 • 2ddeps: T4.12
  • Apple Developer certificate installed and valid
  • electron-builder produces signed .dmg; notarization via notarytool passes
  • App installs and runs on clean Mac without developer tools installed
  • Auto-updater fetches update from S3 releases.shopilot.ai bucket and applies successfully
T5.8Electron Security HardeningSergio#1 • 1ddeps: T5.7
  • CSP headers verified in devtools (no unsafe-inline, no unsafe-eval)
  • nodeIntegration: false and sandbox: true in all BrowserWindow configs
  • webSecurity: true enforced (no local CORS bypass)
  • Opt-out telemetry: settings toggle works and persists after restart
T5.9UI/UX Beta Bug FixesSergio#1 • 3.5d(+0.5d post-audit T4.BB)deps: T4.19
  • All P1/P2 UI bugs from beta report closed in Linear
  • RAM usage < 500MB after 30 minutes of active use (Chrome DevTools Memory profiler)
  • All animations complete in < 300ms (no visible jank)
  • Loading states visible for all async operations (> 200ms)
T5.10Billing Stripe LiveSergio#13 • 1ddeps: T3.23, T5.4
  • Stripe environment switched from test to live keys (live keys in Secrets Manager)
  • End-to-end: real card checkout → credits granted → tool executes successfully
  • Webhook signature verification passes on live endpoint
  • SSL certificate valid on billing endpoint
T5.MK1Mockup Dashboard view: integrate final post-audit components in a dashboard-style viewMockup vista Dashboard: integrar componentes finales post-auditoría en vista estilo dashboardSergio#1 • 1ddeps: T4.BB, T5.9
  • Dashboard view integrates final post-audit design system componentsVista dashboard integra componentes finales del design system post-auditoría
  • All views use consistent Figma-derived design tokensTodas las vistas usan tokens de diseño derivados de Figma de forma consistente

Pablo — #17 Beautonomous · #16 Eval Suite — 6 tasks

T5.11Beta Onboarding 10–15 SellersPablo#17 • 3ddeps: T5.7, T5.4
  • 10+ sellers download and install .dmg successfully
  • Each seller connects at least 1 marketplace via OAuth
  • Each seller executes at least 1 tool in first session (activation metric)
  • All sellers attended 1-on-1 30min call
T5.12Feedback Calls + IterationPablo#17 • 2ddeps: T5.11
  • 15min structured call completed with each beta user; notes documented
  • Top 5 issues identified; Linear tasks created via Beautonomous
  • At least 3 sellers provide NPS score
T5.13OWASP Top 10 Security ReviewPablo#16 • 1ddeps: T5.4, T5.7
  • Injection: all user inputs sanitized (manual test + linting pass)
  • Auth: JWT validation present on all protected endpoints
  • Data exposure: seller data not visible to other sellers (cross-account test)
  • All P1 OWASP findings fixed before Go/No-Go
T5.14System Prompt v2 BeautonomousPablo#17 • 1d
  • Prompt updated to reflect 10 weeks of real usage patterns
  • Governance rules updated for new workflows discovered during sprint
  • Technical docs (dev plans) indexed in OpenClaw KB
  • Test: ask Beautonomous about Shopilot architecture → returns accurate, up-to-date answer
T5.15Go/No-GoPablo#17 • 4hdeps: T5.1–T5.14
  • 60-minute sync call with all 4 engineers completed
  • All checklist items green: tools responding, Stripe live, 10+ beta, .dmg signed, OWASP P1s resolved, p95 < 3s, LLM cost guard active, eval ≥ 0.70
  • Pablo explicitly signs off with ‘Go’ in #engineering Slack
  • Any ‘No’ item → owner and ETA documented in Linear before re-vote
T5.15aE2E Eval PipelinePablo#16 • 2ddeps: T4.17
  • Pipeline executes 10+ end-to-end scenarios (distinct from LLM Judge quality eval)
  • Each scenario: user query → tool selected → tool executes → response generated
  • Scenarios cover: READ, WRITE (with confirmation), ANALYSIS, proactive, off-scope rejection
  • All scenarios pass before Go/No-Go

9.9 Daily Execution Blueprint — Day-by-Day Schedule per Engineer Blueprint Ejecución Diaria — Cronograma Día a Día por Ingeniero

6 sprint cycles × 10 days (Pre-Sprint = 5 days). Every task from 70-EXEC-BACKLOG-CORREGIDO v2.0. Color = engineer lane. 6 ciclos × 10 días (Pre-Sprint = 5 días). Cada tarea de 70-EXEC-BACKLOG-CORREGIDO v2.0. Color = lane del ingeniero.

Pre-Sprint — W0 (5 days) — CORE SetupPre-Sprint — W0 (5 días) — Setup CORE Gate: Beautonomous operational. All 4 members validated.Gate: Beautonomous operacional. 4 miembros validados.
DíaMateoAndrésSergioPablo
D1
T0.2OAuth GitHub (30m)
T0.4OAuth Slack (30m)
T0.1Crear Proyecto OpenClaw (30m)
T0.3OAuth Linear (30m)
D2
T0.5System Prompt v1 Beautonomous (4h — 1/2)
D3
T0.5System Prompt v1 (2/2)
T0.6Mapeo de Roles (1h)
T0.7Estructura Linear: 17 proy. + 6 ciclos (2h)
D4
T0.8Validación CORE (1h)
T0.8Validación CORE (1h)
T0.8Validación CORE (1h)
T0.8Validación CORE — Beautonomous operacional
D5
Beautonomous operational. Linear tracking starts. All 17 projects bootstrapped.Beautonomous operacional. Tracking Linear iniciado. 17 proyectos bootstrap.
Sprint 1-2 — W1-W2 (10 days) — Walking SkeletonSprint 1-2 — W1-W2 (10 días) — Walking Skeleton Gate: AgentLoop unblocked (T1.6 starts W3). Electron loads marketplace. KB indexed.Gate: AgentLoop desbloqueado (T1.6 inicia W3). Electron carga marketplace. KB indexada.
DíaMateoAndrésSergioPablo
D1
T1.4ILLMClient: chat() acepta toolDefs + thinkingBudget (1/2)
T1.9Scaffold Marketplace Provider: DDD + DI + VOs + Errors
T1.16Scaffold Electron 28+ + builder + hot reload + preload
T1.21KB Fix duplicados: TRUNCATE, embedded_at, CI Go 1.24 (1/2)
D2
T1.4ILLMClient update (2/2)
T1.10IMarketplaceAdapter 23 métodos + ISKUResolver (4h)
T1.15cIOAuth2Flow interface — port genérico OAuth2 (4h)
T1.17MainWindow + WebContentsView — NO BrowserView (1/2)
T1.21KB Fix duplicados (2/2)
D3
T1.1DynamoDB: UUID→ULID, Trace SK, GSI fix, dead code (1/3)
T1.11AES256GCM + ITokenManager + DDB marketplace-credentials (1/2)
T1.17MainWindow + WebContentsView (2/2)
T1.22KB Contextual Retrieval: prefijo + chunking Markdown (1/2)
D4
T1.1DynamoDB fix (2/3)
T1.11AES256GCM + ITokenManager (2/2)
T1.20Auth Memberstack: JWT en electron-store cifrado OS
T1.22KB Contextual Retrieval (2/2)
D5
T1.1DynamoDB fix (3/3) — CDK Stack + KeyBuilders + tests
T1.15aSellerConnection aggregate: state machine 5 estados (1d)
T1.15bMarketplaceAction entity + IRepository (4h)
T1.19Tabs + Sidebar container: IPC + Toggle Cmd+B (1/3, dep T0.BB tokens)
T1.24Eval Fase 0: interfaces + modelos + golden 15-20 YAML (1/3)
D6
T1.2UserProfile entity + IUserProfileRepository + DDB impl
T1.14Verificar Terraform GCP: GCS, Cloud Run, Airflow, BigQuery
T1.15Solicitar deps externas E1-E5: Amazon SP-API, MeLi, Shopify (4h)
T1.19Tabs + Sidebar container (2/3)
T1.24Eval Fase 0 (2/3)
D7
T1.5SystemPromptComposer L1+L2: cache_control ephemeral (1/2)
T1.12MeLiOAuth2Flow + MeLiAdapter: REST API + errores (1/3)
T1.18MarketplaceDetector: URL patterns MeLi/Amazon/Shopify
T1.19Sidebar token setup T0.BB (3/3 — 0.5d)
T1.24Eval Fase 0 (3/3) — 15 golden cases YAML
D8
T1.5SystemPromptComposer L1+L2 (2/2)
T1.12MeLiAdapter (2/3)
T1.MK1Mockup shell container (0.5d)
T1.2510 READ tool specs: name, desc LLM, schema JSON, risk, credits (1/2)
D9
T1.3Historial en prompt: findWindowForPrompt, budget 200K (1/2)
T1.12MeLiOAuth2Flow + MeLiAdapter (3/3) — reutiliza context/
T1.2510 READ tool specs (2/2)
D10
T1.3Historial en prompt (2/2)
T1.13Amazon LWA scaffold: SP-API SDK + rate limit (1/2, dep E1)
T1.23Contenido KB: 15-20 docs MeLi/Amazon/Shopify (inicia, cont. W3)
S1-2: T1.3+T1.4+T1.5 done → T1.6 unblocked for W3. MeLiAdapter live. KB populated.S1-2: T1.3+T1.4+T1.5 listos → T1.6 desbloqueado para W3. MeLiAdapter vivo.
Sprint 3-4 — W3-W4 (10 days) — Core EnginesSprint 3-4 — W3-W4 (10 días) — Motores Core Gate: ToolRegistry + 10 READ stubs. Shell chat basic. KB in BigQuery. Eval runs 15 cases.Gate: ToolRegistry + 10 READ stubs. Shell chat. KB en BigQuery. Eval ejecuta 15 casos.
DíaMateoAndrésSergioPablo
D1
T1.6AgentLoopOrchestrator: ReAct loop, MAX_ROUNDS=10, cost guard (1/3)
T1.13Amazon LWA scaffold (2/2) — full SP-API rate limit families
T2.17Chat UI: burbujas + markdown + indicadores thinking/tool (1/3, +T1.BB integration)
T1.23Contenido KB 15-20 docs (1/4 — cont. desde S1-2)
D2
T1.6AgentLoop (2/3)
T2.10ShopifyOAuth2Flow + ShopifyAdapter: GraphQL Admin API (1/3)
T2.17Chat UI (2/3)
T1.23Contenido KB (2/4)
D3
T1.6AgentLoop (3/3) — retry + is_error tool_result
T1.7RestResponseEventEmitter: modo REST sin streaming (4h)
T2.10ShopifyAdapter (2/3)
T2.18CoachWebSocketService: WS client main process + backoff (dep T1.7)
T1.23Contenido KB (3/4)
D4
T1.8Verificar Observability con ReAct: traces multi-step compatibles
T2.10ShopifyAdapter (3/3) — throttling Shopify cost-points
T2.19Inyección contexto URL→metadata via MarketplaceDetector
T1.23Contenido KB (4/4) — indexar pipeline
D5
T2.1ToolRegistry + ToolDefinition: register/getDefinitions/getHandler (1/2)
T2.13DataSync Fase 0.5: Clean Arch API — IDataReader, VOs, DI (1/2)
T2.20Navegación: /chat /profile /billing /enrollment /onboarding
T2.22KB Procesamiento incremental: SHA-256 hash + is_current flag (1/2)
D6
T2.1ToolRegistry (2/2)
T2.13DataSync Clean Arch (2/2)
T2.21OnboardingWizard 5 pasos (+T1.BB integration): Bienvenida + OAuth + Perfil + Query + Éxito (1/3)
T2.22KB incremental (2/2)
D7
T2.2IToolExecutor + ToolExecutor: execute(name, args, ctx)→ToolResult
T2.11AmazonAdapter completo (si E1 ok): SP-API Reports+Catalog (1/3)
T2.17Chat UI (3/3 — T1.BB tokens 0.5d)
T2.21OnboardingWizard (2/3)
T2.23KB Batch embeddings: 250 textos/llamada Vertex AI (1/2)
D8
T2.3ToolPolicyFilter: risk gate + marketplace gate (1d)
T2.5aToolResult domain model: immutable value (4h)
T2.9Tool result caching in-memory: Map por sesión READ/ANALYSIS (4h)
T2.11AmazonAdapter (2/3)
T2.21OnboardingWizard (3/3 — T1.BB 0.5d)
T2.23KB Batch (2/2) — goroutine pool + retry 429/5xx
D9
T2.4HookLifecycle: before_tool → execute → after_tool (1d)
T2.11AmazonAdapter (3/3) — backoff 5 req/s
T2.MK1Mockup ChatView (1d)
T2.24Eval Fase 1: AnthropicLLMJudge + EvalRunner + CLI (1/3)
D10
T2.510 READ tool stubs: handlers HTTP mock en handlers/read/ (1/2)
T2.14DAGs existentes verificados: MeLi+Shopify @hourly + Bronze schemas
T2.MK2Mockup Onboarding (0.5d)
T2.24Eval LLM Judge (2/3)
S3-4: ToolRegistry + 10 READ stubs. Shopify+Amazon adapters. KB BigQuery. Eval runner.S3-4: ToolRegistry + stubs. Shopify+Amazon. KB BigQuery. Eval runner ejecuta.
Sprint 5-6 — W5-W6 (10 days) — WRITE Tools + Billing + EnrichmentSprint 5-6 — W5-W6 (10 días) — Tools WRITE + Billing + Enrichment Gate: WRITE tool executes in marketplace. Billing charges credits. Enrichment returns data.Gate: WRITE tool ejecuta en marketplace. Billing cobra créditos. Enrichment retorna datos.
DíaMateoAndrésSergioPablo
D1
T2.510 READ stubs (2/2) — estructura handlers/read/
T2.5bupdate_user_profile SYSTEM tool: LLM invoca al detectar info vendedor (4h)
T2.12TokenRefreshCron: EventBridge 5min + mutex DDB + alerta 3 fallos (1d)
T2.15CDK base AWS: DDB conv-api, Lambda+APIGW v2, VPC, DDB credentials (1/2)
T3.19BillingView: plan + créditos + stats + botones Stripe (+T2.BB integration, 1/3)
T2.24Eval LLM Judge (3/3) — 20 golden cases mínimo
D2
T2.5ccontextSummary: resumen auto cuando historial supera threshold tokens (1d)
T2.5d17 WRITE tool stubs: registrar en ToolRegistry con ConfirmationRequired (4h)
T2.15CDK base AWS (2/2)
T3.19BillingView (2/3)
T2.25Testing E2E Playground: flujos reales Sellerfy → issues Linear
T2.26Bootstrap 147 tareas en Linear via Beautonomous (4h)
D3
T2.6IContextAssembler: KB + BrandHealth RAG paralelo, degradación graceful (1/2)
T2.16GitHub Actions CI multi-repo: lint+typecheck+tests en PR (1d)
T2.16amarketplace-actions DDB table en CDK: pk sellerId, sk actionId (4h)
T3.19BillingView (3/3 — T2.BB tokens 0.5d)
T3.20Diálogos confirmación WRITE: diff rojo/verde + timeout 35min (dep T3.2)
T2.26aQuality gate 5-step Beautonomous: structure→lint→tests→arch→conv (1d)
T3.25KB Indexación BigQuery: verificar top-5 semantic para 5 queries
D4
T2.6IContextAssembler (2/2)
T2.16bAmazonAds OAuth2 dual: flujo separado de LWA (1d)
T2.16cISKUResolver: MeLi/Amazon/Shopify mapping bidireccional (1d)
T3.21Cards sugerencias + progreso tools (+T2.BB 0.5d): spinner + click contextualizado (1/2)
T3.26Eval CI: eval-on-pr.yml, coach staging→LLM Judge→PR block (1/2)
D5
T2.7BrandHealthContextService.getHealthSummary: siempre en system prompt (1d)
T2.8Prompt caching Anthropic: SystemPromptBlock[] cache_control ephemeral (1d)
T3.13Fast Data Layer: 11 endpoints FastAPI GET /data/{uid}/fast/{tool} (1/3)
T3.21Cards sugerencias (2/2 — T2.BB integration)
T3.22ProfileView: marketplaces + stats + preferencias + useProfile hook
T3.26Eval CI (2/2) — <10 min para 20-30 casos
D6
T3.110 READ handlers reales: Zod→HTTP→ToolResult (1/3)
T3.13Fast Data Layer (2/3)
T3.23Stripe Checkout Pro $49/mes + Customer Portal autoservicio (1/3)
T3.27Golden dataset 50 casos: 15 producto+10 pricing+8 WRITE+17 edge (1/3)
D7
T3.110 READ handlers reales (2/3)
T3.13Fast Data Layer (3/3) — pyarrow GCS Parquet <500ms
T3.23Stripe Checkout (2/3)
T3.27Golden dataset 50 casos (2/3)
D8
T3.110 READ handlers reales (3/3)
T3.14GCS snapshots para ConfirmationFlow: pre-write state + cleanup DAG (1d)
T3.23Stripe Checkout (3/3) — webhook checkout.session.completed
T3.27Golden dataset (3/3)
D9
T3.2ConfirmationFlow: pausar→diff→A/R→ejecutar, TTL 35min (1/2)
T3.15DAG Amazon: IExtractor+ILoader+AmazonAuthMgr+AmazonExtractor (1/3)
T3.24ICreditsGate: POST /internal/gate + conditional DDB write (1/2)
T3.MK1Mockup BillingView (0.5d)
T3.28QA flujos conversación 3 marketplaces datos Sellerfy (1/2)
D10
T3.2ConfirmationFlow (2/2) — OrchestrationSession DDB
T3.15DAG Amazon (2/3)
T3.24ICreditsGate (2/2) — Free 50cr, Pro 500cr, packs
T3.24aBilling schema migration: idempotente + credit_transactions (1d)
T3.MK2Mockup ProfileView (0.5d)
T3.MK3Mockup ConfirmDialog (0.5d)
T3.28QA flujos (2/2) — issues → Linear via Beautonomous
S5-6: WRITE handlers unblocked (T3.3 starts W7). ICreditsGate live. Fast Data 11 endpoints.S5-6: WRITE handlers desbloqueados (T3.3 inicia W7). ICreditsGate vivo. Fast Data OK.
Sprint 7-8 — W7-W8 (10 days) — Hardening + StagingSprint 7-8 — W7-W8 (10 días) — Hardening + Staging Gate (G2): Staging full stack. Load test 50 users. WS streaming. Proactive. Eval >=0.70.Gate (G2): Staging full stack. Load test 50 usuarios. WS streaming. Eval ≥0.70.
DíaMateoAndrésSergioPablo
D1
T3.34 WRITE handlers: update_product/price/pause/activate (1/3)
T3.15DAG Amazon (3/3) — Bronze schemas MeLi+Shopify verificados
T3.24bSubscriptionLifecycleService: activate/cancel grace/upgrade (1d)
T4.16KB batch v2 + v3: pipeline >5min → activar incremental (1/2)
D2
T3.34 WRITE handlers (2/3)
T4.9aAPI GW v2 WebSocket en CDK: $connect/$disconnect/$default + DDB conn-ids
T3.24cMonthly credit reset cron: EventBridge 1ro/mes + pack credits 12m
T4.16KB batch v2 (2/2) — target >80% hit rate retrieval
D3
T3.34 WRITE handlers (3/3) — snapshot pre-write + verify
T3.16IRateLimiter x3: MeLi token bucket + Amazon burst + Shopify leaky
T4.10WebSocket client progresivo (+T3.BB 0.5d): 8 eventos server→client + backoff (1/3)
T4.17Eval auto CI: 50 golden en push main, falla si <0.70 (1/2)
D4
T3.4ProactiveSuggestionService: afterTool hook + dedup 7d + max 2/turno (1/2)
T3.17Onboarding trigger: primer sync post-conexión marketplace (1d)
T3.18CI/CD multi-repo: deploy auto staging en merge main + Org Secrets (2d start)
T4.10WS client progresivo (2/3) — dep T4.1 (infra lista D6)
T4.17Eval auto CI (2/2) — resultados → #engineering Slack
D5
T3.4ProactiveSuggestion (2/2) — sin reglas hardcoded, LLM evalua
T3.18CI/CD multi-repo (2/2) — 11 repos
T4.10WS client (3/3 — T3.BB tokens 0.5d)
T4.11EnrollmentView standalone: BrowserWindow OAuth redirect por marketplace (1d)
T4.18Testing proactivas datos reales Sellerfy: triggers + dedup + prompt (1/2)
D6
T3.5IGuardService + InputGuard: injection patterns + fuera de scope (1d)
T3.5aHttpCreditGate: POST /internal/gate pre-tool, fail-open (1d, dep T3.24)
T4.6Staging deploy full stack AWS+GCP: CDK + Terraform + health checks (1/3)
T4.11EnrollmentView standalone — completar + tests
T4.18Testing proactivas (2/2) — iterar prompt
D7
T4.1WebSocket streaming: reemplazar REST, 8 eventos server→client (1/2)
T4.6Staging deploy (2/3)
T4.12Sentry crash reporting: source maps + agrupación errores (4h)
T4.MK1Mockup EnrollmentView (0.5d)
T4.19Selección 10-15 beta users Sellerfy + video walkthrough 2min (1/2)
D8
T4.1WS streaming (2/2) — restaurar sesión en reconexion
T4.6Staging deploy (3/3) — api-staging.shopilot.ai verde
T4.13Feedback Loop scaffold: interfaces + FeedbackEntry models + CDK (1d)
T4.MK2Mockup WRITE flow completo (1d start)
T4.19Beta users prep (2/2) — formulario feedback + calls 30min
D9
T4.2SystemPromptComposer L3: bloque escritura writeCapable, cap 1200 tok
T4.3OutputGuard: prevención fuga datos + filtrado peligroso (1d)
T4.5aFeedbackCapture hook: after_tool WRITE → POST /feedback/capture (1d)
T4.7Load testing 50 usuarios: Artillery/k6, p95 <2s (1/2)
T4.14calculateImpactScore + DynamoFeedbackRepository (1/2)
T4.19aEval contract testing pipeline: Tool Registry→DataSync/MP/Enrichment (1/2)
D10
T4.5Optimización performance: p95 <3s, cache, paralelización (1/2)
T4.5bActionLog entity + DynamoActionLogRepository: GSI1 Conv#{convId} (1d)
T4.7Load testing (2/2)
T4.8CloudWatch dashboard + alertas PagerDuty/Slack p95+error+LLM cost (1/2)
T4.14calculateImpactScore + repo (2/2)
T4.19aContract testing (2/2)
T4.19bKB quality eval: precision@5 + recall + hit rate, falla CI <80% (1d)
G2 — Staging full stack. Load test p95 <2s. WS streaming live. Eval >=0.70. Pablo approves.G2 — Staging full stack. Load test p95 <2s. WS streaming. Eval ≥0.70. Pablo aprueba.
Sprint 9-10 — W9-W10 (10 days) — LaunchSprint 9-10 — W9-W10 (10 días) — Launch Gate (G3 Go/No-Go): 10+ beta users. .dmg signed. Billing live. Eval >=0.70. p95 <3s. 0 P0 bugs.Gate (G3 Go/No-Go): 10+ beta. .dmg firmado. Billing live. Eval ≥0.70. p95 <3s. 0 bugs P0.
DíaMateoAndrésSergioPablo
D1
T4.5Optimización performance (2/2) — profiling cuellos botella
T4.8CloudWatch dashboard (2/2) — alertas Slack costo LLM/dia >$50
T4.15FeedbackMeasurerService + Lambdas: processEntries >7d + EventBridge (1/2)
T5.10Billing Stripe live: switch test→live + verificar checkout+webhooks (dep T3.23 + T5.4)
D2
T5.1LLMGuardChecker: Haiku clasificador inputs sospechosos + fallback (1d)
T5.4Deploy producción: CDK prod + Terraform apply + SSL + health checks (1/3)
T4.15FeedbackMeasurer (2/2) — CDK stack
T5.11Onboarding beta 10-15 vendedores: .dmg → marketplace → query (1/3)
D3
T5.2Bug fixes backend P1/P2: edge cases vacíos, LLM rehusa tool, WRITE concurrentes (1/4)
T5.4Deploy producción (2/3)
T4.15aFeedbackGate anti-fatigue: max 1 prompt/dia + cooldown 24h (4h)
T4.15bExplicit feedback endpoint: POST /feedback/:userId/explicit (4h)
T5.11Onboarding beta (2/3)
D4
T5.2Bug fixes backend (2/4)
T5.4Deploy producción (3/3) — api.shopilot.ai
T4.15cImplicit feedback endpoint: POST /feedback/implicit + edited/rejected (4h)
T4.15dGrace period 7d billing: webhook subscription.deleted + cron (4h)
T5.11Onboarding beta (3/3) — activación 1+ tool primera sesión
D5
T5.2Bug fixes backend (3/4)
T5.5IaC prod completo: DDB PITR 35d + Secrets Mgr + IAM + concurrency (1/2)
T5.7Code signing + .dmg + auto-updater: notarytool + S3 releases (1/2)
T5.12Feedback calls: 15min/beta user + top 5 issues → Linear (1/2)
D6
T5.2Bug fixes backend (4/4) — tokens expirados mid-conv
T5.5IaC prod (2/2) — GCS lifecycle + Redis backup + PG backups
T5.7.dmg + auto-updater (2/2) — probar Mac limpio sin dev tools
T5.12Feedback calls (2/2) — documentar funciona/no funciona
D7
T5.3System Prompt v3 final: ajuste tono + tool selection + edge cases (1d)
T5.6Rollback testing: Lambda revert <1min + Cloud Run revision <1min (1d)
T5.8Hardening Electron: CSP + sandbox + nodeIntegration=false (1d)
T5.13OWASP top 10 review: injection/auth/XSS/SSRF + arreglar P1s (1d)
D8
T5.6aData Sync Fase 4: OpenMetadata FQNs + embed_fast_dag + embed_health_dag (1/2)
T5.9Bug fixes UI/UX beta (+T4.BB post-audit 0.5d): RAM profiling <500MB + animaciones + loading (1/4)
T5.14System Prompt v2 Beautonomous: 10 semanas uso real + indexar docs técnicos (1d)
D9
T5.6aData Sync OpenMetadata (2/2) — linaje visible
T5.9Bug fixes UI/UX (2/4)
T5.15aE2E eval pipeline: 10+ escenarios flujo completo query→tool→response (1/2)
D10
T5.9Bug fixes UI/UX (3/4 + 4/4 — T4.BB post-audit)
T5.MK1Mockup Dashboard view (1d start)
T5.15aE2E eval pipeline (2/2)
G3 Go/No-Go — T5.15: Pablo firma. Checklist: tools OK, Billing live, 10+ beta, .dmg firmado, OWASP P1s, p95 <3s, eval >0.70.G3 Go/No-Go — T5.15: Pablo firma. Checklist completo → Launch.

S11-12 — Buffer (Weeks 11-12) — Circuit Breaker Scope S11-12 — Buffer (Semanas 11-12) — Scope Circuit Breaker

Absorbs beta bugs, hardening, and deferred scope cut by circuit breakers in S7-8. Absorbe bugs beta, hardening, y scope diferido por circuit breakers en S7-8.

Ingeniero S11 — Hardening S12 — Scope Diferido
Mateo Bug fixes P1/P2 inteligencia. Optimización p95 si no alcanzado. WRITE tools cortadas en S7-8 (T4.4 circuit breaker) Advertising tools Fase 5: 4 WRITE (create/update/pause/activate_campaign). Enrichment Rainforest API adapter (Amazon market intel). ProactiveSuggestions v2. LLMGuardChecker Phase 2
Andrés Hardening producción: alertas, runbooks, rollback drills. Fix bugs adapters Amazon/Shopify DAG Silver→Gold (si cortado T4.9). Rate limiters datos reales. Monitoring expandido
Sergio Bug fixes UI/UX beta pendientes. RAM profiling. .dmg hotfix si necesario Auto-updater S3. Windows build (si alcanza). FeedbackThrottle anti-fatigue refinement. Feedback UI mejorada
Pablo Iteración Eval conversaciones reales. Expansión golden dataset edge cases KB v3: docs preguntas beta que v2 no cubía. Eval score target 0.80

9.9.2Cross-Engineer Handoff Schedule — 17 Critical Dependencies 9.9.2Schedule de Handoffs Cross-Ingeniero — 17 Dependencias Críticas

Every task that blocks a different engineer. The day the “From” task completes is the earliest start for “Unblocks”. Extracted from doc 70 “Depende” column — cross-owner only. Cada tarea que bloquea a otro ingeniero. El día que completa “From” es el inicio más temprano de “Desbloquea”. Extraído de la columna “Depende” del doc 70 — solo cross-owner.

# Handoff — What transfersQué se entrega From → To Done byListo en UnblocksDesbloquea Earliest startInicio mín. Risk
1 T1.25 — 10 READ tool specs: nombres, schemas, risk levels Pablo → Mateo C1-D9 T2.1 ToolRegistry registra definiciones desde specs C2-D1 MED
2 T1.1 — DynamoDB fix: OrchestrationSession schema estable Mateo → Andrés C1-D3 T2.15 CDK base AWS necesita DDB table schema correcto C2-D1 MED
3 T1.7 — RestResponseEventEmitter: endpoint REST live Mateo → Sergio C1-D7 T2.18 CoachWebSocket necesita REST fallback endpoint C2-D3 HIGH
4 T1.12 — MeLiAdapter: OAuth + IMarketplaceAdapter tipado Andrés → Sergio C1-D9 T2.21 OnboardingWizard requiere OAuth MeLi funcional C2-D5 HIGH
5 T1.6 + T2.1 — ReAct loop + ToolRegistry: 10 tools operativos Mateo → Pablo C2-D5 T2.25 E2E playground necesita loop + tools para probar C2-D6 HIGH
6 T2.13 — Fast Data Layer: 11 endpoints FastAPI live Andrés → Mateo C2-D6 T3.1 READ handlers se conectan a Fast Data (contrato definido) C3-D1 HIGH
7 T3.2 — ConfirmationFlow backend: hold queue + DDB TTL 35min Mateo → Sergio C3-D4 T3.20 Diálogos UI de confirmación integran con ConfirmationFlow API C3-D5 HIGH
8 T3.1 + T3.3 — READ handlers reales + WRITEs operativos Mateo → Pablo C3-D9 T3.28 QA flujos end-to-end sobre tools ya operativas C3-D10 HIGH
9 T3.24 — ICreditsGate: contrato POST /internal/gate Sergio → Mateo C3-D9 T3.5a HttpCreditGate llama POST /internal/gate antes de cada tool C3-D10 HIGH
10 T3.4 — ProactiveSuggestionService: afterTool hook events Mateo → Pablo C3-D5 T4.18 Testing proactivas requiere ProactiveSuggestion event format C4-D8 MED
11 T4.1 — WS streaming server: contrato 8 eventos definido Mateo → Sergio C4-D7 T4.10 WS client progresivo maneja 8 tipos de evento del server C4-D8 CRIT
12 T4.5a — FeedbackCapture hook: after_tool escribe FeedbackEntry Mateo → Sergio C4-D9 T4.13+ Feedback Loop scaffold: IFeedbackRepository + models C4-D10 HIGH
13 T4.6 — Staging full stack: api-staging.shopilot.ai Andrés → Andrés + Pablo C4-D8 T4.7 load test 50 users + T4.17 Eval CI gate sobre staging C4-D9 HIGH
14 T5.4 — Producción deployed: api.shopilot.ai live Andrés → Pablo C5-D4 T5.11 Beta onboarding usa URL producción real (no staging) C5-D5 CRIT
15 T5.7 — .dmg code-signed + notarizado: instalable en Mac virgen Sergio → Pablo C5-D6 T5.11 Beta users instalan .dmg en llamadas 1-on-1 de onboarding C5-D7 CRIT
16 T5.10 — Billing Stripe live: transacciones reales funcionando Sergio → Mateo C5-D8 T5.3 System Prompt v3 activa monetización en prompts de producción C5-D9 HIGH
17 T5.1–T5.14 — Todo el equipo: todos los entregables S5 listos Todos → Pablo C5-D9 T5.15 Go/No-Go checklist final: 11 criterios deben estar ✓ C5-D10 CRIT
⚠ Any delay in a CRITICAL handoff delays the entire project. Daily Beautonomous standup tracks these 17 handoffs automatically. ⚠ Cualquier retraso en un handoff CRITICAL retrasa todo el proyecto. El standup diario de Beautonomous monitorea estos 17 handoffs automáticamente.

9.9.3 Task Summary by Engineer Resumen de Tareas por Ingeniero

4 engineers • 183 tasks • 50d each4 ingenieros • 183 tareas • 50d c/u

Aggregate view of tasks, estimated days, and projects per engineer across the full 10-sprint MVP. Identifies overloads early and informs circuit-breaker decisions. Vista agregada de tareas, días estimados y proyectos por ingeniero en los 10 sprints del MVP. Identifica sobrecargas temprano e informa decisiones de circuit-breaker.

Mateo — CTO

53 taskstareas
79d / 50d — 1.58× [RIESGO]

Primary owner: 2-INTELLIGENCE layer (projects #2–#8), 3-KNOWLEDGE (#9 Cerebro KB). Secondary: #11 Enrichment, #17 CORE system prompt, #18 Design System token pipeline. Highest cognitive load — ReAct orchestration + all WRITE tool handlers + KB architecture Go 1.24 + Vertex AI.Propietario principal: capa 2-INTELLIGENCE (proyectos #2–#8), 3-KNOWLEDGE (#9 Cerebro KB). Secundario: #11 Enrichment, system prompt #17 CORE, #18 Design System token pipeline. Mayor carga cognitiva — orquestación ReAct + todos los handlers tools WRITE + arquitectura KB Go 1.24 + Vertex AI.

Sprint Key TasksTareas Clave Est.
S1–2 T1.1 DynamoDB, T1.3 UserProfile, T1.6 ILLMClient, T1.7 REST, T1.8 ReAct scaffold, T1.9–T1.12 system prompt v1, context window, T1.21–T1.23 KB fix+retrieval+docs, T1.25 10 READ specs 14d
S3–4 T2.1 HookLifecycle, T2.2 ToolRegistry, T2.3–T2.8 10 READ stubs, T2.9 IContextWindowManager, T2.11 system prompt v1, T2.22 KB incremental, T2.23 batch embeddings 15d
S5–6 T3.1 Fast Data Layer wiring, T3.2–T3.7 READ real handlers, T3.4 ProactiveSuggestionService, T3.8 ConfirmationFlow backend, T3.11–T3.14 WRITE handlers (4 tools), T3.25 KB BigQuery indexing, T3.32 DS token pipeline (dep T0.BB) 20d
S7–8 T4.1 WebSocket upgrade, T4.2–T4.6 remaining 13 WRITE handlers, T4.3 ActionLog, T4.4 ToolPolicyFilter, T4.5 system prompt v2, T4.16 KB batch v2 20d
S9–10 T5.1 system prompt v3, T5.2 Personality Engine, T5.3 ProactiveSuggestions v2, T5.4 Context Aggregator, T5.5 bug fixes P0/P1 10d

Overload mitigation: Mateo (1.58×) is overloaded. Sergio reduced to 1.00× after #18 DS Figma moved to external UX/UI team. MANDATORY: reassign Enrichment CDK (T3.10) to Andrés in S5–6. If Mateo slips ≥3d, defer remaining WRITE tools to S11–12. Mateo is the highest-risk SPOF — owns Intelligence + KB + DS pipeline.Mitigación sobrecarga: Mateo (1.58×) está sobrecargado. Sergio reducido a 1.00× tras mover #18 DS Figma al equipo externo UX/UI. OBLIGATORIO: reasignar CDK Enrichment (T3.10) a Andrés en S5–6. Si Mateo se atrasa ≥3d, diferir WRITE tools restantes a S11–12. Mateo es el SPOF de mayor riesgo — es dueño de Intelligence + KB + pipeline DS.

Andrés — Data+BE

36 taskstareas
52.5d / 50d — 1.05×

Primary owner: 4-ACTION (#12 Marketplace Provider) + 5-PLATFORM (#14 DevOps). Secondary: 3-KNOWLEDGE (#10 Data Sync). Float in S1–4, blocker from S5 (Fast Data Layer is the critical dependency).Propietario principal: 4-ACTION (#12 Marketplace Provider) + 5-PLATFORM (#14 DevOps). Secundario: 3-KNOWLEDGE (#10 Data Sync). Holgura en S1–4, bloqueante desde S5 (Fast Data Layer es la dependencia crítica).

Sprint Key TasksTareas Clave Est.
S1–2 T1.14 MeLi OAuth2, T1.15 AmazonAdapter scaffold + SP-API request Day 1, T1.16 IMarketplaceAdapter, T1.17 Terraform VPC/CDK base, T1.33 electron-builder CI 8.5d
S3–4 T2.10 ShopifyOAuth2Flow + AmazonAdapter OAuth, T2.12 TokenRefreshCron, T2.13 DynamoDB credentials, T2.14 CDK Cloud Run, T2.15 Secret Manager 12d
S5–6 T3.13 Fast Data Layer Cloud Run, T3.15 Redis ElastiCache, T3.16 IRateLimiter, T3.17 CI/CD GitHub Actions, T3.18 Billing DynamoDB, T3.19 ICreditsGate 14d
S7–8 T4.7 load test k6, T4.8 CloudWatch dashboards, T4.9 PagerDuty, T4.9a WebSocket CDK, Silver+Gold circuit breaker, Staging deploy 10d
S9–10 T5.6 Production Terraform deploy, T5.6a rollback + OpenMetadata, OWASP scan infra, prod monitoring 8d

Float buffer: S1–4 has 2d float per sprint. Critical from S5: Fast Data Layer (T3.13) must be ready before Mateo can wire real handlers. Amazon SP-API approval: requested Day 1 — if not by S3–4, scaffold with mocks and defer to S5.Buffer holgura: S1–4 tiene 2d holgura por sprint. Crítico desde S5: Fast Data Layer (T3.13) debe estar listo antes de que Mateo conecte handlers reales. Aprobación Amazon SP-API: solicitada Día 1 — si no aprobada en S3–4, scaffold con mocks y diferir a S5.

Sergio — Full-stack

45 taskstareas
50d / 50d — 1.00×

Primary owner: 1-PRODUCT (#1 Native Shell). Secondary: 5-PLATFORM (#13 Billing), 6-QUALITY (#15 Feedback Loop), 3-KNOWLEDGE (#11 Enrichment). #18 DS Figma now owned by external UX/UI team — Sergio CONSUMES Figma components and creates integration Mockups (T*.MK*). Load reduced from 1.50× to ~1.06×.Propietario principal: 1-PRODUCT (#1 Native Shell). Secundario: 5-PLATFORM (#13 Billing), 6-QUALITY (#15 Feedback Loop), 3-KNOWLEDGE (#11 Enrichment). #18 DS Figma ahora propiedad del equipo externo UX/UI — Sergio CONSUME componentes Figma y crea Mockups de integración (T*.MK*). Carga reducida de 1.50× a ~1.06×.

Sprint Key TasksTareas Clave Est.
S1–2 T1.16 Scaffold Electron, T1.17 MainWindow, T1.18 MarketplaceDetector, T1.20 Auth Memberstack, T1.32 canary build. W2 with T0.BB:Sem 2 con T0.BB: T1.19 sidebar UI (2.5d), T1.MK1 Mockup shell container (0.5d) 9d
S3–4 T2.17 Chat UI + Markdown (2.5d), T2.18 CoachWebSocketService, T2.19 URL context injection, T2.20 react-router, T2.21 OnboardingWizard (2.5d), T2.MK1 Mockup ChatView (1d), T2.MK2 Mockup Onboarding (0.5d), T2.40 Gate 1 signed build 10.5d
S5–6 T3.19 BillingView (2.5d), T3.20 IMarketIntelligenceAdapter + MeLi Search, T3.21 Rainforest API (1.5d), T3.9 EnrichmentCache TTL Redis, T3.22 Billing Stripe Checkout, T3.MK1 Mockup BillingView (0.5d), T3.MK2 Mockup ProfileView (0.5d), T3.MK3 Mockup ConfirmDialog (0.5d) 11.5d
S7–8 T4.10 WebSocket 8 events (2.5d), T4.11 EnrollmentView, T4.12 Sentry, T4.13–T4.15 FeedbackLoop, T4.MK1 Mockup EnrollmentView (0.5d), T4.MK2 Mockup WRITE flow completo (1d), T4.24 Gate 2 signed build 10.5d
S9–10 T5.7 .dmg code signing + notarization, T5.8 Electron hardening, T5.9 beta bugs + RAM <500MB (3.5d), T5.10 Stripe live, T5.MK1 Mockup Dashboard view (1d) 8.5d

#18 DS Figma tasks moved to external UX/UI team — Sergio now consumes Figma components and creates integration Mockups (T*.MK*). Load reduced from 1.50× to 1.00×. Pablo is approval gate for all T*.BB deliverables. MANDATORY: T3.10 Enrichment CDK → Andrés.Tareas Figma #18 DS movidas al equipo externo UX/UI — Sergio ahora consume componentes Figma y crea Mockups de integración (T*.MK*). Carga reducida de 1.50× a 1.00×. Pablo es gate de aprobación para todos los entregables T*.BB. OBLIGATORIO: T3.10 CDK Enrichment → Andrés.

Pablo — CEO

32 taskstareas
39d / 50d — 0.78× [SLACK]

Primary owner: 6-QUALITY (#16 Eval Suite), 7-INTERNAL (#17 CORE Beautonomous). Gate decision-maker for all 3 Go/No-Go gates. 11d of slack for beta user recruitment, feedback calls, and strategic decisions.Propietario principal: 6-QUALITY (#16 Eval Suite), 7-INTERNAL (#17 CORE Beautonomous). Tomador de decisiones en los 3 gates Go/No-Go. 11d holgura para reclutamiento beta, llamadas feedback y decisiones estratégicas.

Sprint Key TasksTareas Clave Est.
S0 T0.1–T0.8: OpenClaw project, Linear structure, System Prompt v1 Beautonomous, role mapping, validation. T0.9–T0.10: Apple Developer + Windows cert. T0.11: Brand Book delivery 4d
S1–2 T1.24 Eval scaffold + golden dataset, T1.26 brand registration, T1.27 store auth (Apple+Windows). #18 UX/UI approves T0.BB + T1.BB 8d
S3–4 T2.24 LLM Judge + EvalRunner, T2.25 E2E test suite, T2.26 ~150 Linear tasks, T2.26a quality gate. #18 UX/UI approves T2.BB 7d
S5–6 T3.26 Eval CI automation, T3.27 golden dataset 50 examples, T3.28 QA 3 marketplaces. #18 UX/UI approves T3.BB 7d
S7–8 T4.17 Eval automated CI (score ≥0.70), T4.18 proactive suggestions testing, T4.19 beta prep, T4.19a contract testing, T4.19b KB quality eval. #18 UX/UI approves T4.BB 8d
S9–10 T5.11 onboarding beta 10+ users, T5.12 feedback calls, T5.13 OWASP review, T5.14 Beautonomous SP v2, T5.15 Go/No-Go, T5.15a E2E eval pipeline. #18 UX/UI: no BB — pipeline closed, point queries only 5d

Gate authority: Pablo is the sole Go/No-Go decision-maker for Gate 1 (S4), Gate 2 (S8), Gate 3 (S10). Eval score is the objective metric — if <0.70 at Gate 2, Pablo blocks launch regardless of feature completeness.Autoridad de gate: Pablo es el único tomador de decisiones Go/No-Go para Gate 1 (S4), Gate 2 (S8), Gate 3 (S10). Eval score es la métrica objetiva — si <0.70 en Gate 2, Pablo bloquea lanzamiento independientemente de completitud de features.

UX/UI — External Design TeamEquipo Externo de Diseño

EXTERNALEXTERNO

#18 Design System Figma. Delivers T*.BB Brand Book milestones. Pablo is approval gate for all deliverables. Sergio consumes Figma components for integration Mockups.#18 Design System Figma. Entrega hitos T*.BB Brand Book. Pablo es gate de aprobación para todos los entregables. Sergio consume componentes Figma para Mockups de integración.

Sprint DeliverablesEntregables Est.
S1–2 W1 T0.BB (4d) — Figma Foundations + Tokens + Iconography + Core Components partialFigma Foundations + Tokens + Iconografía + Core Components parcial 4d
S1–2 W2 T1.BB (6d) — Atoms + Molecules base + Chat OrganismsAtoms + Molecules base + Chat Organisms 6d
S3–4 T2.BB (6d) — Molecules remaining + Data/Flow OrganismsMolecules restantes + Data/Flow Organisms 6d
S5–6 T3.BB (5d) — Advanced Organisms + [LIB] Pattern ComponentsAdvanced Organisms + [LIB] Pattern Components 5d
S7–8 T4.BB (3d) — Figma Quality Audit + CorrectionsAuditoría de Calidad Figma + Correcciones 3d
S9–10 No BB — pipeline closed, point queries onlySin BB — pipeline cerrado, solo consultas puntuales

External team works in parallel. Pablo approves each T*.BB milestone before Sergio can integrate. T0.BB must be ready by S1-2 W1 end for Sergio to use tokens in T1.19.Equipo externo trabaja en paralelo. Pablo aprueba cada hito T*.BB antes de que Sergio pueda integrar. T0.BB debe estar listo al final de S1-2 W1 para que Sergio use tokens en T1.19.

Capacity Overview — All Engineers Resumen de Capacidad — Todos los Ingenieros

EngineerIngeniero S1–2 S3–4 S5–6 S7–8 S9–10 TotalTotal Cap. LoadCarga
Mateo 14d 15d 20d 20d 10d 79d 50d 1.58×
Andrés 8.5d 12d 14d 10d 8d 52.5d 50d 1.05×
Sergio 9d 10.5d 11.5d 10.5d 8.5d 50d 50d 1.00×
Pablo 8d 7d 7d 8d 5d 35d +S0: 4d 50d 0.78×
UX/UI ext. 10d 6d 5d 3d 24d ext.ext. EXT

Estimates include design + implementation + review. Does not count async standup, ceremonies (<2h/week). Circuit breaker: any task not done at sprint deadline is cut to S11–12 (Shape Up rule).Estimados incluyen diseño + implementación + revisión. No incluye standup asíncrono, ceremonias (<2h/semana). Circuit breaker: cualquier tarea no lista al deadline del sprint se corta a S11–12 (regla Shape Up).

168 internal tasks (T0.1–T5.MK1) + 5 external UX/UI milestones (T0.BB–T4.BB) across 6 phases, 6 sprint cycles (10+2 weeks), 19 projects. All tracked in Linear via Beautonomous (#17 CORE). 168 tareas internas (T0.1–T5.MK1) + 5 hitos externos UX/UI (T0.BB–T4.BB) en 6 fases, 6 ciclos (10+2 semanas), 19 proyectos. Todas trackeadas en Linear via Beautonomous (#17 CORE).

9.9.4Project #17 CORE — Governance Across All Projects 9.9.4Proyecto #17 CORE — Gobernanza en Todos los Proyectos

Every project in the MVP must comply with Beautonomous governance. This is not optional — it's the foundation that enables 4 engineers to operate as 10-15. Cada proyecto en el MVP debe cumplir con la gobernanza de Beautonomous. Esto no es opcional — es la base que permite a 4 ingenieros operar como 10-15.

Proj CORE Governance Requirements Requisitos Gobernanza CORE
#12All PRs reviewed via Beautonomous (El Mago). OAuth token changes require 🍠 approval. Adapter code changes logged in Audit Log.Todos los PRs revisados via Beautonomous (El Mago). Cambios en tokens OAuth requieren aprobacion 🍠. Cambios en codigo de adaptadores registrados en Audit Log.
#8Alert configurations reviewed by El Mago. Dashboard access: read-only for El Artesano, full for El Mago.Configuraciones de alertas revisadas por El Mago. Acceso dashboard: solo lectura para El Artesano, completo para El Mago.
#13Stripe configuration changes: 🔴 irreversible (El Mago approval). Pricing changes: El Capitan proposes, El Mago approves.Cambios configuracion Stripe: 🔴 irreversible (aprobacion El Mago). Cambios de precios: El Capitan propone, El Mago aprueba.
#10DAG configuration changes: 🍠 requires approval. Beautonomous monitors Airflow pipeline health. Data deletion: 🔴 irreversible.Cambios configuracion DAG: 🍠 requiere aprobacion. Beautonomous monitorea salud pipelines Airflow. Eliminacion de datos: 🔴 irreversible.
#2LLM model changes: 🍠 El Mago approval. System prompt updates: tracked in Beautonomous with version history. Cost guard threshold: El Capitan + El Mago.Cambios modelo LLM: 🍠 aprobacion El Mago. Actualizaciones system prompt: trackeadas con historial. Cambios umbral cost guard: El Capitan + El Mago.
#3Adding/removing tools: 🍹 reversible, El Mago reviews. Tool risk level assignments: El Mago only.Agregar/quitar tools: 🍹 reversible, El Mago revisa. Asignacion niveles de riesgo de tools: solo El Mago.
#9KB document updates: 🍹 reversible, all roles. KB schema changes: 🍠 El Mago approval.Actualizaciones docs KB: 🍹 reversible, todos los roles. Cambios schema KB: 🍠 aprobacion El Mago.
#15Feedback data access: read-only for all. Feedback rules changes: El Mago.Acceso datos feedback: solo lectura para todos. Cambios reglas feedback: El Mago.
#4Personality/tone changes: El Capitan proposes (product), El Mago reviews (technical). Prompt template versioning via Beautonomous.Cambios personalidad/tono: El Capitan propone (producto), El Mago revisa (tecnico). Versionado templates de prompt via Beautonomous.
#5Context source changes: 🍹 reversible. Stale threshold changes: El Mago.Cambios fuentes de contexto: 🍹 reversible. Cambios umbral stale: El Mago.
#6Proactive rule changes: El Capitan defines (product), El Mago implements (code). New rules: 🍹 reversible.Cambios reglas proactivas: El Capitan define (producto), El Mago implementa (codigo). Reglas nuevas: 🍹 reversible.
#1UI text/color changes: 🍹 reversible, all roles can propose via Beautonomous PR. Code signing: 🔴 El Mago only. Feature flags: 🍠 El Mago.Cambios texto/color UI: 🍹 reversible, todos los roles via PR Beautonomous. Code signing: 🔴 solo El Mago. Feature flags: 🍠 El Mago.
#14Infrastructure changes: 🍠 all require El Mago approval. Production deploy: 🔴 El Mago explicit confirmation via Beautonomous. Secrets rotation: 🔴 irreversible.Cambios infraestructura: 🍠 todos requieren aprobacion El Mago. Deploy produccion: 🔴 confirmacion explicita El Mago. Rotacion de secrets: 🔴 irreversible.
#16Eval threshold changes: El Capitan + El Mago. Test case additions: 🍹 reversible, all roles.Cambios umbrales eval: El Capitan + El Mago. Adicion casos de prueba: 🍹 reversible, todos los roles.
#7Guard rule changes: 🍠 El Mago (security critical). Data leak alert: 🔴 immediate notification to entire team.Cambios reglas guard: 🍠 El Mago (critico seguridad). Alerta fuga datos: 🔴 notificacion inmediata a todo el equipo.
#11External API credential changes: 🍠 El Mago. New adapter integration: 🍹 reversible. API cost monitoring via Beautonomous alerts.Cambios credenciales API externas: 🍠 El Mago. Integracion nuevo adaptador: 🍹 reversible. Monitoreo costos API via alertas Beautonomous.

9.10 Critical Path & Dependencies Camino Critico y Dependencias

Critical Path (blocks everything downstream) Camino Critico (bloquea todo lo que sigue)

T0.8 CORE (S0) T1.1 DynamoDB fix (S1) T1.6 AgentLoopOrchestrator (S2) T2.1 ToolRegistry (S3-4) T3.1 10 READ handlers10 handlers READ (S5-6) T4.1 WS + T4.4 WRITE remainingWRITE restantes (S7-8) GATE 2 (S8) T5.4 ProductionProducción (S9) T5.7 .dmg (S9) T5.11 Beta (S9) T5.15 Go/No-Go (S10)

If ANY item on the critical path slips, the launch date slips. The critical path goes through Mateo's work (S1-8) and then Pablo's beta onboarding (S9). Si CUALQUIER item del camino crítico se retrasa, la fecha de lanzamiento se retrasa. El camino crítico pasa por el trabajo de Mateo (S1-8) y luego el beta onboarding de Pablo (S9).

Cross-Track Dependencies Dependencias Cross-Track

S0
Pablo → ALL — Project #17 CORE must be operational before Sprint 1. All engineers use Beautonomous for task management. Pablo → TODOS — Proyecto #17 CORE debe estar operacional antes de Sprint 1. Todos los ingenieros usan Beautonomous para manejo de tareas.
S0
ALL → ALL — Pre-Sprint technical alignment session. Decisions document signed before any code. TODOS → TODOS — Sesion de alineacion técnica Pre-Sprint. Documento de decisiones firmado antes de cualquier codigo.
S1
ALL → ALL — Repos already exist (11 repos in org). Verify access, branch protection, and permissions before writing any code. TODOS → TODOS — Los repos ya existen (11 repos en la org). Verificar acceso, branches y permisos antes de escribir código.
S1
UX/UI → Sergio — T0.BB (Figma Foundations + Tokens + Core Components partial) must be delivered end of week 1 so Sergio can setup tokens and build Tabs+Sidebar (T1.19) in week 2. UX/UI → Sergio — T0.BB (Figma Foundations + Tokens + Core Components parcial) debe entregarse fin de semana 1 para que Sergio configure tokens y construya Tabs+Sidebar (T1.19) en semana 2.
S2
UX/UI → Sergio — T1.BB (Atoms + Molecules + Chat Organisms) must be delivered end of S2. Sergio uses them in S3-4 for Chat UI (T2.17) and OnboardingWizard (T2.21). UX/UI → Sergio — T1.BB (Atoms + Molecules + Organismos Chat) debe entregarse fin de S2. Sergio los usa en S3-4 para Chat UI (T2.17) y OnboardingWizard (T2.21).
S2
Mateo → Sergio — REST endpoint (T1.7 RestResponseEventEmitter) must be ready by end of S2 so Sergio can connect the WebSocket client with REST polling fallback. Mateo → Sergio — Endpoint REST (T1.7 RestResponseEventEmitter) debe estar listo para final de S2 para que Sergio conecte el WebSocket client con fallback REST polling.
S5
Andres → Mateo — Fast Data Layer (T3.13) or marketplace adapters must be ready to connect the 10 real READ handlers (T3.1). S3-4 stubs don't need real adapters. Andres → Mateo — Fast Data Layer (T3.13) o adaptadores deben estar listos para conectar los 10 READ handlers reales (T3.1). Los stubs de S3-4 no necesitan adaptadores reales.
S3-4
UX/UI → Sergio — T2.BB (Molecules remaining + Data/Flow Organisms) must be delivered end of S4. Sergio uses them in S5-6 for BillingView (T3.19), ProfileView (T3.22), and Confirmation Dialogs (T3.20). UX/UI → Sergio — T2.BB (Molecules restantes + Organismos Datos/Flujos) debe entregarse fin de S4. Sergio los usa en S5-6 para BillingView (T3.19), ProfileView (T3.22), y Diálogos de Confirmación (T3.20).
S5
Mateo → Sergio — WRITE tool confirmation protocol (T3.2 ConfirmationFlow) must be defined for confirmation UI (T3.20). Mateo → Sergio — Protocolo de confirmación de tools WRITE (T3.2 ConfirmationFlow) debe estar definido para UI de confirmación (T3.20).
S5
Mateo → Sergio — ProactiveSuggestionService (T3.4) must be ready for suggestion cards UI (T3.21). Mateo → Sergio — ProactiveSuggestionService (T3.4) debe estar listo para cards de sugerencias (T3.21).
S5
Sergio → Mateo — ICreditsGate backend (T3.24) must be ready for HttpCreditGate middleware (T3.5a). Sergio → Mateo — Backend ICreditsGate (T3.24) debe estar listo para el middleware HttpCreditGate (T3.5a).
S7
Andres → Mateo — API Gateway v2 WebSocket CDK routes (T4.9a) must be ready for WebSocket streaming in production (T4.1). Andres → Mateo — Rutas WebSocket CDK API Gateway v2 (T4.9a) deben estar listas para streaming WebSocket en producción (T4.1).
S7
Sergio → Mateo — Feedback Loop scaffold (T4.13) must be ready for FeedbackCapture in HookLifecycle (T4.5a). Sergio → Mateo — Scaffold Feedback Loop (T4.13) debe estar listo para FeedbackCapture en HookLifecycle (T4.5a).
S5-6
UX/UI → Sergio — T3.BB (Advanced Organisms + [LIB] Pattern Components) must be delivered end of S6. Sergio uses them in S7-8 for EnrollmentView (T4.11) and WS WRITE flow (T4.10). UX/UI → Sergio — T3.BB (Organismos Avanzados + [LIB] Pattern Components) debe entregarse fin de S6. Sergio los usa en S7-8 para EnrollmentView (T4.11) y flujo WRITE WS (T4.10).
S7
Mateo → Sergio — WebSocket streaming server (T4.1) must be ready for WebSocket client progressive (T4.10). Mateo → Sergio — Servidor WebSocket streaming (T4.1) debe estar listo para WebSocket client progresivo (T4.10).
S7-8
UX/UI → Sergio — T4.BB (Figma Quality Audit) must be delivered end of S8. Sergio uses post-audit corrections in S9-10 bug fixes (T5.9, +0.5d). UX/UI → Sergio — T4.BB (Auditoría Calidad Figma) debe entregarse fin de S8. Sergio usa correcciones post-auditoría en bug fixes S9-10 (T5.9, +0.5d).
S9
ALL → ALL — All tracks converge for beta bug fixes. TODOS → TODOS — Todos los tracks convergen para fix de bugs de beta.

Non-Critical Path: AndresCamino No-Critico: Andres

Andres has float in S1-4 (stubs don't depend on real adapters — Mateo can use mock data). From S5-6 Andres is a blocker: Fast Data Layer (T3.13) blocks Mateo's real READ handlers (T3.1). No float from S5 onward.Andres tiene float en S1-4 (los stubs no dependen de adaptadores reales — Mateo puede usar datos mock). Desde S5-6 es blocker: Fast Data Layer (T3.13) bloquea los READ handlers reales de Mateo (T3.1). Sin float desde S5 en adelante.

Non-Critical Path: Sergio (partially)Camino No-Critico: Sergio (parcialmente)

Electron shell and billing have ~1 week float for S1-6. New dependency: UX/UI delivers Figma components (T0.BB–T4.BB) each sprint; Sergio consumes them + creates Mockups to validate integration. If UX/UI delivery slips, Sergio can continue with placeholder tokens. S7-10 (onboarding + shipping) becomes critical — must ship .dmg for beta by S9.Shell Electron y billing tienen ~1 semana de float para S1-6. Nueva dependencia: UX/UI entrega componentes Figma (T0.BB–T4.BB) cada sprint; Sergio los consume + crea Mockups para validar integración. Si la entrega de UX/UI se retrasa, Sergio puede continuar con tokens placeholder. S7-10 (onboarding + shipping) se vuelve crítico — debe entregar .dmg para beta en S9.

What If the Critical Path Slips?Que Pasa Si el Camino Critico Se Retrasa?

ReAct Loop slips 1 week: Sergio builds chat UI against REST mock. Pablo tests via API directly. Recoverable.Loop ReAct se retrasa 1 semana: Sergio construye chat UI contra mock REST. Pablo testea via API directo. Recuperable.

Tool Registry slips 1 week: Gate 1 delayed. Use S5-6 buffer. WRITE tools phase must be tighter (1 week instead of 2).Tool Registry se retrasa 1 semana: Gate 1 retrasado. Usar buffer S5-6. Fase de tools WRITE debe ser mas ajustada (1 semana en vez de 2).

WRITE Tools slip 1 week: Ship beta with read-only tools. WRITE tools added in S10 as hotfix. Acceptable degradation.Tools WRITE se retrasan 1 semana: Lanzar beta con tools de solo lectura. Tools WRITE agregadas en S10 como hotfix. Degradacion aceptable.

Multiple things slip: Cut scope: launch with MeLi only (defer Amazon + Shopify), or launch READ + ANALYSIS only (defer WRITE tools). Shopify adapter (T2.10) slipping to S11 is a major delay — scope cut is preferable.Varias cosas se retrasan: Cortar scope: lanzar solo con MeLi (diferir Amazon + Shopify), o lanzar solo READ + ANALYSIS (diferir tools WRITE). Diferir ShopifyAdapter (T2.10) a S11 es un retraso enorme — mejor cortar scope.

9.11 Cross-Project Dependency Map Mapa de Dependencias Cross-Proyecto

Three complementary views: blocking table, critical path DAG, and key task chains. Hard deps block execution; soft deps degrade quality if delayed. Tres vistas complementarias: tabla de bloqueos, DAG de ruta critica, y cadenas de tareas clave. Deps duras bloquean ejecucion; deps suaves degradan calidad si se retrasan.

A — Project Blocking Table A — Tabla de Bloqueos por Proyecto

Blocking ProjectProyecto Bloqueador Blocked ProjectsProyectos Bloqueados Type Critical? Earliest UnblockDesbloqueo Más Temprano
#17 Beautonomous ALL 19 active projects — governance, task tracking, code reviewTODOS los 19 proyectos activos — gobernanza, task tracking, code review Hard Yes — Day 0Sí — Dia 0 W0 (T0.8)
#2 Orchestrator #3 Tool Registry (ReAct executes tools) • #6 Proactive (afterTool hook) • #7 Guardrails (pre/post-LLM) • #4 Personality (prompt composition) Hard Yes W2 (T1.6)
#12 Marketplace #10 Data Sync (adapters needed for DAGs) • #5 Context Agg. (reads via adapters) • #11 Enrichment (MeLi Search adapter) • #3 Tools (adapter dispatch) Hard Yes W2 (T1.12)
#14 DevOps ALL projects — infra must exist before any service deploys to staging/productionTODOS los proyectos — infra debe existir antes de cualquier deploy a staging/produccion Hard Yes W3 (T2.15)
#3 Tool Registry #6 Proactive (evaluates tool results) • #15 Feedback (captures write executions) • #13 Billing (credits gate pre-tool) Hard Yes W3 (T2.1)
#1 Shell #13 Billing UI (BillingView in sidebar) • #6 Proactive (suggestion cards UI) Soft No W2 (T1.19)
#8 Observability #16 Eval Suite (eval metrics sourced from traces) • #14 DevOps (alerts in monitoring) Soft No W2 (T1.8)
#10 Data Sync #9 Cerebro KB (embeddings from Gold data) • #5 Context Agg. (fresh data for context) Soft No W3 (T2.13)
#13 Billing #2 Orchestrator (credits gate blocks tool calls) Hard No (fail-open) W6 (T3.24)
#18 Design System #1 Native Shell (tokens + components consumed by desktop client UI via Mockupstokens + componentes consumidos por la UI del cliente vía Mockups) Soft No T0.BB (W1) → T4.BB (W8)T0.BB (S1) → T4.BB (S8)

B — Critical Path DAG (Project Level) B — DAG Ruta Critica (Nivel Proyecto)

#17 CORE ──────────────────────────────────────────────────────────── governance (all sprints)
    │
    ├──► #2 Orchestrator (W1-2) ──► #3 Tool Registry (W3) ──► #6 Proactive (W5) ──► BETA
    │         │                           │                          │
    │         │                      #7 Guardrails              #15 Feedback
    │         │                      (W5, pre/post LLM)          (W7, write hooks)
    │         │
    │         └──► #4 Personality (W1, system prompt)
    │
    ├──► #12 Marketplace (W1-2)
    │         ├──► #10 Data Sync (W3) ──► #9 Cerebro KB (W3-4) ──► #5 Context Agg. (W3)
    │         │         │                     │
    │         │    #11 Enrichment (W5)    RAG quality
    │         │
    │         └──► #13 Billing (W6) ──► Ship (W10)
    │                   │
    │              #1 Shell (W1-10, parallel UI) ◄── #18 Design System (UX/UI team, T0.BB→T4.BB + Sergio Mockups)
    │
    ├──► #14 DevOps (W3 CDK IaC) ──► Staging (W7-8) ──► Production (W9-10) ──► Ship
    │
    └──► #8 Observability (W1-2) ──► #16 Eval Suite (W3-8) ──► CI/CD Quality Gate

C — Critical Task Chains (Cross-Engineer) C — Cadenas de Tareas Criticas (Cross-Ingeniero)

Main Product Path (blocking — if any step slips, W10 slips)Ruta Principal del Producto (bloqueante — si cualquier paso se retrasa, la S10 se retrasa)

T0.8 → T1.6 (AgentLoop) → T2.1 (ToolRegistry) → T3.3 (WRITE handlers) → T4.1 (WS Streaming) → T5.7 (.dmg) → T5.10 (Billing live) → T5.15 (Go/No-Go)

Owners: All → Mateo → Mateo → Mateo → Mateo → Sergio → Sergio → PabloOwners: Todos → Mateo → Mateo → Mateo → Mateo → Sergio → Sergio → Pablo

Shell + WebSocket Integration PathRuta Shell + Integración WebSocket

T0.8 → T1.16 (Electron scaffold) → T1.17 (MainWindow+WCV) → T1.19 (Tabs+Sidebar) → T2.17 (Chat UI) → T2.18 (CoachWebSocket) → T4.10 (WS client) → T5.7 (.dmg)

Owner: Sergio (all steps)Owner: Sergio (todos los pasos)

Marketplace Data PathRuta de Datos de Marketplace

T1.9 (scaffold) → T1.10 (IAdapter) → T1.12 (MeLiAdapter) → T2.13 (DataSync arch) → T3.13 (Fast Data Layer) → T2.6 (IContextAssembler) → T4.6 (Staging) → T5.4 (Prod)

Owners: Andrés → Andrés → Andrés → Andrés → Andrés → Mateo → Andrés → AndrésOwners: Andrés → Andrés → Andrés → Andrés → Andrés → Mateo → Andrés → Andrés

Billing PathRuta de Billing

T3.23 (Stripe Checkout) → T3.24 (ICreditsGate backend) → T3.24a (schema migration) → T3.5a (HttpCreditGate in API) → T5.10 (Billing live)

Owners: Sergio → Sergio → Sergio → Mateo → SergioOwners: Sergio → Sergio → Sergio → Mateo → Sergio

UX/UI → Sergio Figma Component Path (every sprint)Ruta Componentes Figma UX/UI → Sergio (cada sprint)

T0.BB (W1) → T1.19+T1.MK1 (W2) → T1.BB (W2) → T2.17+T2.MK1+T2.21+T2.MK2 (W3-4) → T2.BB (W4) → T3.19+T3.MK1+T3.22+T3.MK2+T3.20+T3.MK3 (W5-6) → T3.BB (W6) → T4.10+T4.MK2+T4.11+T4.MK1 (W7-8) → T4.BB (W8) → T5.9+T5.MK1 (W9-10)

Owners: UX/UI → Sergio → UX/UI → Sergio → UX/UI → Sergio → UX/UI → Sergio → UX/UI → Sergio. Pablo approves each T*.BB deliveryOwners: UX/UI → Sergio → UX/UI → Sergio → UX/UI → Sergio → UX/UI → Sergio → UX/UI → Sergio. Pablo aprueba cada entrega T*.BB

Cross-Engineer Critical HandoffsHandoffs Críticos Cross-Ingeniero

Mateo T1.7 (REST emitter) —→ Sergio T2.18 (CoachWebSocket needs REST fallback endpoint)

Andrés T1.12 (MeLiAdapter) —→ Mateo T2.6 (IContextAssembler reads via adapters)

Sergio T3.24 (ICreditsGate backend) —→ Mateo T3.5a (HttpCreditGate calls POST /internal/gate)

Mateo T4.1 (WS streaming server) —→ Sergio T4.10 (WS client renders 8 event types)

Mateo T3.2 (ConfirmationFlow backend) —→ Sergio T3.20 (confirmation dialogs UI)

Mateo T3.4 (ProactiveSuggestionService) —→ Sergio T3.21 (suggestion cards UI)

9.12 Integration Milestones & Risk Gates Milestones de Integracion y Gates de Riesgo

S4 — Integration Gate 1: "It Talks"Gate de Integracion 1: "Habla"

GO / NO-GOGO / NO-GO

Go Criteria (ALL must pass)Criterios Go (TODOS deben pasar)

  • ReAct Orchestrator (#2) ↔ Tool Registry (#3) E2E workingOrquestador ReAct (#2) ↔ Tool Registry (#3) E2E funcionando
  • Native Shell (#1) ↔ REST/WebSocket connection with fallback verifiedShell Nativo (#1) ↔ conexión REST/WebSocket con fallback verificada
  • MeLi + Amazon adapters (#12) operational. 10 READ tool stubs returning mock dataAdaptadores MeLi + Amazon (#12) operacionales. 10 stubs READ respondiendo datos mock
  • DevOps IaC scaffold (#14) ready for stagingScaffold IaC DevOps (#14) listo para staging
  • Unit test coverage ≥70% on critical pathsCobertura tests unitarios ≥70% en caminos criticos
  • Eval runner executes 15+ golden cases (T2.24)Eval runner ejecuta 15+ golden cases (T2.24)
  • Beautonomous (#17) used for all task managementBeautonomous (#17) usado para todo el manejo de tareas

If Gate FailsSi el Gate Falla

  • Decision maker: Pablo (CEO)Tomador de decision: Pablo (CEO)
  • Option A: Extend S3-4 by 1 week, compress S5-6Opcion A: Extender S3-4 por 1 semana, comprimir S5-6
  • Option B: Ship Gate 1 with mock data, fix in S5Opcion B: Pasar Gate 1 con datos mock, arreglar en S5
  • Option C: Reassign work if one track is blockedOpcion C: Reasignar trabajo si un track esta bloqueado
Gate 1 Demo ScriptScript de Demo Gate 1

1. Open Electron app → MeLi loads in WebContentsViewAbrir app Electron → MeLi carga en WebContentsView

2. Open sidebar → type "How are my sales this week?"Abrir sidebar → escribir "Como van mis ventas esta semana?"

3. Watch full REST response → mock data from stubsVer respuesta REST completa → datos mock de stubs

4. Ask about product MLA123456 → READ stub execution → response with KB contextPreguntar sobre producto MLA123456 → ejecución de stub READ → respuesta con contexto KB

5. Switch tab to Amazon → context changes automaticallyCambiar tab a Amazon → contexto cambia automaticamente

S8 — Integration Gate 2: "It Acts"Gate de Integración 2: "Actúa"

GO / NO-GOGO / NO-GO

Go Criteria (ALL must pass)Criterios Go (TODOS deben pasar)

  • All 3 marketplaces (#12) + Billing (#13) flow completeFlujo completo 3 marketplaces (#12) + Billing (#13)
  • Enrichment (#11) ANALYSIS tools returning cached dataEnrichment (#11) ANALYSIS tools retornando datos cacheados
  • CI/CD pipeline (#14) auto-deploying to stagingPipeline CI/CD (#14) auto-deploy a staging
  • Eval Suite (#16) LLM-as-Judge pipeline running in CIEval Suite (#16) pipeline LLM-as-Judge corriendo en CI
  • WebSocket streaming working (T4.1)WebSocket streaming funcionando (T4.1)
  • Proactive suggestions active (T3.4)Sugerencias proactivas activas (T3.4)
  • Eval score ≥0.70 (T4.17)Eval score ≥0.70 (T4.17)
  • Load test 50 users passes p95 <2s (T4.7)Load test 50 usuarios pasa p95 <2s (T4.7)
  • WRITE tools + confirmation flow tested with real dataTools WRITE + flujo confirmación testeado con datos reales
  • E2E test count ≥30 passingCantidad tests E2E ≥30 pasando

If Gate FailsSi el Gate Falla

  • Decision maker: Pablo (CEO)Tomador de decisión: Pablo (CEO)
  • Option A: Use S11-12 buffer for fixing (that's what it's there for)Opcion A: Usar buffer S11-12 para arreglar (para eso existe)
  • Option B: Ship beta with read-only (no WRITE tools)Opcion B: Lanzar beta con solo lectura (sin WRITE tools)
  • Option C: Cut Shopify, ship MeLi + Amazon onlyOpcion C: Cortar Shopify, lanzar solo MeLi + Amazon

S10 — Launch Gate: "It Ships"Gate de Lanzamiento: "Se Lanza"

LAUNCHLANZAMIENTO

Go Criteria (ALL must pass)Criterios Go (TODOS deben pasar)

  • All 4 tracks converge: full E2E with real Sellerfy dataLos 4 tracks convergen: E2E completo con datos reales de Sellerfy
  • Guardrails (#7) input/output filtering activeGuardrails (#7) filtrado input/output activo
  • Security review (OWASP top 10) passedRevision seguridad (OWASP top 10) aprobada
  • Production deploy + monitoring + rollback testedDeploy produccion + monitoreo + rollback testeado
  • 10+ beta users onboarded and active10+ beta users onboarded y activos
  • Stripe billing live and testedBilling Stripe en vivo y testeado
  • .dmg signed, download page live.dmg firmado, página de descarga en vivo
  • Eval score ≥0.70Eval score ≥0.70
  • API p95 <3sAPI p95 <3s
  • 0 P0 bugs0 bugs P0

Rollback PlanPlan de Rollback

  • API: Lambda version revert (<1 min) / Cloud Run revision rollback for Data APIAPI: revertir version Lambda (<1 min) / rollback revision Cloud Run para Data API
  • App: auto-updater pushes hotfixApp: auto-updater envia hotfix
  • Data: DynamoDB point-in-time recoveryData: DynamoDB point-in-time recovery
  • If critical: disable WRITE tools server-side via ToolPolicyFilterSi crítico: deshabilitar WRITE tools server-side via ToolPolicyFilter

Quality Bar at Each GateBarra de Calidad en Cada Gate

TestingTesting

• S4: Unit ≥70%, E2E ≥10Unit ≥70%, E2E ≥10

• S7: Unit ≥80%, E2E ≥30Unit ≥80%, E2E ≥30

• S10: Unit ≥80%, E2E ≥50Unit ≥80%, E2E ≥50

PerformancePerformance

API p95: S4 <5s · S7 <3s · S10 <3sAPI p95: S4 <5s · S7 <3s · S10 <3s

Electron RAM <500MBRAM Electron <500MB

First token: S4 N/A (REST) · S7+ <1sPrimer token: S4 N/A (REST) · S7+ <1s

ReliabilityConfiabilidad

Error rate <1%Tasa error <1%

OAuth refresh 100% successRefresh OAuth 100% exitoso

Target SLA: 99.9%SLA objetivo: 99.9%

9.13 Risk Register Registro de Riesgos

RiskRiesgo Prob.Prob. ImpactImpacto MitigationMitigacion OwnerDueno
Marketplace API rate limits block functionalityRate limits de API de marketplace bloquean funcionalidad MED HIGH Redis cache from S5-6 (T3.16 IRateLimiter per marketplace) + batch sync + incremental queries + EnrichmentCache TTL (T3.9)Cache Redis desde S5-6 (T3.16 IRateLimiter por marketplace) + sync batch + queries incrementales + EnrichmentCache TTL (T3.9) Andres
Electron app consumes too much RAM (>500MB)App Electron consume demasiada RAM (>500MB) LOW MED Target 400MB, monitor from S3. Optimize WebContentsView if neededTarget 400MB, monitorear desde S3. Optimizar WebContentsView si necesario Sergio
ReAct loop too slow (>5s per turn)Loop ReAct muy lento (>5s por turno) MED HIGH Prompt caching, context pruning, faster model for routing, parallel subtasksPrompt caching, context pruning, modelo mas rapido para routing, subtareas paralelas Mateo
OAuth2 token refresh failure (any marketplace)Falla de refresh de token OAuth2 (cualquier marketplace) MED HIGH Manual fallback, user notification, retry with backoff, Secret ManagerFallback manual, notificacion al usuario, retry con backoff, Secret Manager Andres
Proactive suggestions are noisy / low valueSugerencias proactivas son ruidosas / bajo valor HIGH MED LLM inference via afterTool hook (no hardcoded rules), max 2/turn, 7-day dedup, iterate with beta feedbackInferencia LLM via hook afterTool (sin reglas hardcodeadas), max 2/turno, dedup 7 dias, iterar con feedback beta Mateo
LLM uses tools incorrectlyLLM usa tools incorrectamente MED HIGH Precise descriptions in ToolDefinition, MAX_ROUNDS=10, #16 Eval Suite, iterate system prompt v2/v3Descripciones precisas en ToolDefinition, MAX_ROUNDS=10, #16 Eval Suite, iterar system prompt v2/v3 Mateo
WebContentsView has unexpected limitationsWebContentsView tiene limitaciones inesperadas LOW HIGH Validate in Pre-Sprint session. If critical limitations: evaluate webview tag or sandboxed iframeValidar en sesión Pre-Sprint. Si limitaciones críticas: evaluar webview tag o iframe con sandbox Sergio
MeLi Search API insufficient for competitor analysisMeLi Search API insuficiente para analisis de competidores MED MED IMarketIntelligenceAdapter with MeLi Search API. Rainforest API for Amazon (Enrichment #11 dev plan). Redis cache with TTLIMarketIntelligenceAdapter con MeLi Search API. Rainforest API para Amazon (plan dev Enrichment #11). Cache Redis con TTL Mateo
#1 Native Shell SPOF (1 engineer, 35% effort)#1 Shell Nativo SPOF (1 ingeniero, 35% esfuerzo) MED CRIT Pablo cross-trains React/Electron by S4. If Sergio blocked, Pablo covers basic UI fixesPablo hace cross-training React/Electron para S4. Si Sergio bloqueado, Pablo cubre fixes UI básicos Sergio
#11 Enrichment performance too slow for real-time#11 Enrichment performance muy lenta para tiempo real MED MED Redis TTL-based cache. Pre-compute in DAGs. Accept async for heavy analysisCache Redis basado en TTL. Pre-computar en DAGs. Aceptar async para analisis pesados Mateo
Multi-marketplace OAuth complexity (3 different flows)Complejidad OAuth multi-marketplace (3 flujos diferentes) MED MED IOAuth2Flow generic interface (T1.15c). Each marketplace implements its own flow (Andres T2.10). Test each flow independentlyInterfaz genérica IOAuth2Flow (T1.15c). Cada marketplace implementa su propio flujo (Andres T2.10). Testear cada flujo independientemente Andres
MarketplaceDetector breaks on URL changesMarketplaceDetector se rompe por cambio de URLs MED MED Remote config patterns (JSON in GCS) updatable without re-deployPatterns remote config (JSON en GCS) actualizables sin re-deploy Sergio
Amazon SP-API approval takes >4 weeks (E1)Aprobación Amazon SP-API demora >4 semanas (E1) MED HIGH Request Day 1 (T1.15). AmazonAdapter scaffold with mocks. If not approved by S3-4, defer real Amazon to S5Solicitar Día 1 (T1.15). Scaffold AmazonAdapter con mocks. Si no aprobado en S3-4, diferir Amazon real a S5 Andres
Mateo overloaded (79d estimated in 50d available — 1.58× capacity). Owns Intelligence + KB + DS pipelineMateo sobrecargado (79d estimados en 50d disponibles — 1.58× capacidad). Dueño de Intelligence + KB + pipeline DS HIGH CRIT MANDATORY: reassign Enrichment CDK (T3.10) to Andrés S5–6. If slips ≥3d, defer remaining WRITE tools to S11–12OBLIGATORIO: reasignar CDK Enrichment (T3.10) a Andrés en S5–6. Si se atrasa ≥3d, diferir WRITE tools restantes a S11–12 Mateo
UX/UI external team delivery delays block Sergio's Mockup tasksRetrasos en entrega del equipo externo UX/UI bloquean tareas Mockup de Sergio MED MED UX/UI delivers T0.BB–T4.BB each sprint. If delayed, Sergio continues with placeholder tokens — Mockups shift to next sprint. Pablo manages UX/UI as approval gate. Soft dependency: shell code doesn't block on FigmaUX/UI entrega T0.BB–T4.BB cada sprint. Si se retrasa, Sergio continúa con tokens placeholder — Mockups se mueven al sprint siguiente. Pablo gestiona UX/UI como gate de aprobación. Dependencia suave: código del shell no bloquea por Figma UX/UI
Apple notarization rejection delays signed .dmg buildsRechazo de notarización Apple retrasa builds .dmg firmados MED MED Enroll Apple Developer in S0 (T0.9) to surface issues early. First unsigned canary in S1-2 (T1.32) to catch packaging problems. Signed build at Gate 1 (T2.40) with time to iterateInscribir Apple Developer en S0 (T0.9) para detectar problemas temprano. Primer canary sin firmar en S1-2 (T1.32) para detectar problemas de empaquetado. Build firmado en Gate 1 (T2.40) con tiempo para iterar Pablo
Windows SmartScreen blocks .exe without sufficient reputationSmartScreen Windows bloquea .exe sin suficiente reputación MED LOW Procure EV code signing certificate in S0 (T0.10) — EV certs bypass SmartScreen. If OV only, build reputation via internal installs. Windows is secondary to macOS for MVPAdquirir certificado EV code signing en S0 (T0.10) — certificados EV omiten SmartScreen. Si solo OV, construir reputación vía instalaciones internas. Windows es secundario a macOS para MVP Pablo
Stripe integration more complex than expected (webhooks, idempotency)Integración Stripe más compleja de lo esperado (webhooks, idempotencia) LOW MED Use Stripe Checkout hosted (no custom flow). Sergio starts T3.23 in S5 with time. Fail-open if billing doesn't respondUsar Stripe Checkout hosted (sin flujo custom). Sergio empieza T3.23 en S5 con tiempo. Fail-open si billing no responde Sergio

9.14 Infrastructure, DevOps & Cost Model Infraestructura, DevOps y Modelo de Costo

Stack Architecture (Layered) Arquitectura del Stack (por Capas)

1-PRODUCT — Native Shell (#1)1-PRODUCT — Shell Nativo (#1)

Electron 28+ • React 18+ • TypeScript • WebContentsView • WebSocket client • electron-builder/updater

2-INTELLIGENCE — Coach (#2–#8)2-INTELLIGENCE — Coach (#2–#8)

Claude Sonnet 4 (Anthropic API) • ReAct Orchestrator • Tool Registry (tool_use) • Prompt caching • IContextWindowManager • Node.js 22+ TypeScript (Lambda) • AWS API Gateway v2 (HTTP + WebSocket) • DynamoDB (conversations/traces/UserProfile) • PostgreSQL (AgentExecution, triggers créditos) • AWS Secrets Manager

3-KNOWLEDGE — Cerebro KB + Data Sync + Enrichment (#9–#11)3-KNOWLEDGE — Cerebro KB + Data Sync + Enrichment (#9–#11)

Python/FastAPI (GCP Cloud Run) • GCS Parquet (Data Lake) • BigQuery (embeddings) • Vertex AI text-embedding-004 (1024 dims) • Airflow (DAGs) • GCP Secret Manager

4-ACTION — Marketplace Provider (#12)4-ACTION — Marketplace Provider (#12)

MeLi REST API • Amazon SP-API • Shopify GraphQL • DynamoDB (marketplace-credentials + marketplace-actions) • Redis (ElastiCache, from S5-6) • AWS Secrets Manager

5-PLATFORM — Billing + DevOps (#13–#14)5-PLATFORM — Billing + DevOps (#13–#14)

PostgreSQL/RDS (billing/credits) • Stripe (Checkout + webhooks) • AWS CDK • Terraform (GCP) • GitHub Actions CI/CD

6-QUALITY — Feedback Loop + Eval Suite (#15–#16)6-QUALITY — Feedback Loop + Eval Suite (#15–#16)

Sentry • PagerDuty • CloudWatch (dashboards, alertas) • LLM-as-Judge (eval pipeline) • DynamoDB (FeedbackEntry + FeedbackThrottle) • Golden datasets (versioned in git)

7-INTERNAL — Beautonomous (#17)7-INTERNAL — Beautonomous (#17)

#17 CORE Beautonomous (OpenClaw) • Linear • GitHub

CI/CD Pipeline (GitHub Actions) Pipeline CI/CD (GitHub Actions)

Cross-cutting concern owned by Andres (#14 DevOps IaC + pipeline config) + Pablo (quality gates via Beautonomous). Ready by S7. Concern cross-cutting a cargo de Andres (#14 DevOps IaC + config pipeline) + Pablo (quality gates via Beautonomous). Listo para S7.

Push to PR Lint + Type-check Unit TestsTests Unitarios #16 Eval (LLM-as-Judge) Merge to mainMerge a main Integration TestsTests Integracion Auto-deploy StagingAuto-deploy Staging Manual ApproveAprobacion Manual ProductionProduccion

EnvironmentsAmbientes

  • devnpm run dev (Coach) / docker compose up (Data) / mocks for external APIsnpm run dev (Coach) / docker compose up (Datos) / mocks para APIs externas
  • stagingLambda (auto-deploy on merge) / Cloud Run Data APILambda (auto-deploy en merge) / Cloud Run Data API
  • productionLambda + API Gateway (manual promote via CDK)Lambda + API Gateway (promote manual via CDK)
  • Rollback: Lambda version revert (<1 min)Rollback: revertir version Lambda (<1 min)

[CORREGIDO] Coach → Lambda/API Gateway (AWS), no Cloud Run

MonitoringMonitoreo

  • CloudWatch dashboards (latency, errors, RPS, LLM cost/conversation, tool executions, credits)Dashboards CloudWatch (latencia, errores, RPS, costo LLM/conversacion, tool executions, creditos)
  • PagerDuty: p95 >2s, error rate >1%PagerDuty: p95 >2s, tasa error >1%
  • Slack alerts: LLM cost/day >$50, OAuth failure, DAG failureAlertas Slack: costo LLM/día >$50, falla OAuth, falla DAG
  • DynamoDB point-in-time recovery (35 days)DynamoDB point-in-time recovery (35 dias)
  • Target SLA: 99.9% (8.7h/year downtime)SLA objetivo: 99.9% (8.7h/ano downtime)

SecuritySeguridad

  • AWS Secrets Manager (backend: #2,#3,#12) / GCP Secret Manager (data: #9,#10,#11) [CORREGIDO]AWS Secrets Manager (backend: #2,#3,#12) / GCP Secret Manager (datos: #9,#10,#11) [CORREGIDO]
  • AES-256-GCM for marketplace tokens in DynamoDBAES-256-GCM para tokens marketplace en DynamoDB
  • Memberstack JWT (safeStorage in Electron) + OAuth2 per marketplace (MeLi, Amazon LWA, Shopify)Memberstack JWT (safeStorage en Electron) + OAuth2 por marketplace (MeLi, Amazon LWA, Shopify)
  • Encryption at rest + in transit (TLS 1.3)Encriptación at rest + in transit (TLS 1.3)
  • OWASP top 10 checklist at S10Checklist OWASP top 10 en S10
  • Electron: CSP, sandbox, no nodeIntegrationElectron: CSP, sandbox, no nodeIntegration

Cloud Cost Estimate (Monthly, at 100 users) Estimacion Costo Cloud (Mensual, a 100 usuarios)

InfrastructureInfraestructura

Lambda + API Gateway (Coach API)~$50
Redis (ElastiCache)~$30
DynamoDB (AWS)~$25
RDS PostgreSQL (Billing)~$30
GCS + BigQuery~$15
Cloud Run (Data API)~$20
Secrets Manager + misc~$5
Infra subtotalSubtotal infra~$175/mo

AI + ServicesIA + Servicios

Anthropic API (100 users)API Anthropic (100 usuarios)~$400
Stripe (2.9% + $0.30)~$150
Sentry + PagerDuty~$30
Apple Developer ($99/yr)Apple Developer ($99/ano)~$8
Services subtotalSubtotal servicios~$588/mo
Total monthly cost (100 users)Costo mensual total (100 usuarios) ~$763/mo
Revenue (50 Pro × $49)Revenue (50 Pro × $49) $2,450/mo
Gross MarginMargen Bruto ~69%

LLM Cost Model (Per-User Breakdown) Modelo de Costo LLM (Desglose Por-Usuario)

Free TierTier Free

  • • 50 credits/month ≈ 50 tool calls
  • ~$0.15-0.25/user/mo (with caching)~$0.15-0.25/usuario/mes (con caching)
  • Read-only tools, lower token avgTools solo lectura, promedio tokens menor
  • When credits run out: everything blocked until monthly resetCuando se acaban créditos: todo bloqueado hasta reset mensual

Pro TierTier Pro

  • • 500 credits/month ≈ 500 tool calls
  • ~$2.50-4.00/user/mo (with caching)~$2.50-4.00/usuario/mes (con caching)
  • Read+Write + proactive (+20% cost)Lectura+Escritura + proactivo (+20% costo)
  • When credits run out: buy Credit Packs or wait monthly resetCuando se acaban créditos: comprar Credit Packs o esperar reset mensual

AssumptionsSupuestos

  • • Claude Sonnet 4 @ $3/$15 per 1M tokens
  • ~2K input + ~500 output tokens/call avg~2K input + ~500 output tokens/llamada promedio
  • Prompt caching reduces input cost 60-80%Prompt caching reduce costo input 60-80%
  • Margin: $49 - ~$4 = ~$45/Pro user/mo (91%)Margen: $49 - ~$4 = ~$45/usuario Pro/mes (91%)

Infrastructure Provisioning TimelineTimeline de Aprovisionamiento de Infraestructura

S0-1Dev local (Node.js 22+, Docker, repos cloned). AWS Secrets Manager + GCP Secret Manager (data)Dev local (Node.js 22+, Docker, repos clonados). AWS Secrets Manager + GCP Secret Manager (datos)
S3-4CDK base (AWS: DynamoDB, Lambda, API Gateway) + Terraform (GCP: Airflow DAGs, GCS, BigQuery)CDK base (AWS: DynamoDB, Lambda, API Gateway) + Terraform (GCP: Airflow DAGs, GCS, BigQuery)
S5-6ElastiCache Redis (#11 Enrichment cache + #12 rate limiting) + Fast Data Layer (Cloud Run)ElastiCache Redis (#11 Enrichment cache + #12 rate limiting) + Fast Data Layer (Cloud Run)
S7Lambda staging + Cloud Run Data API staging + CI/CD pipeline + Sentry + PagerDutyLambda staging + Cloud Run Data API staging + pipeline CI/CD + Sentry + PagerDuty
S9-10Lambda production (CDK deploy) + Cloud Run production (Terraform) + SSL + domain + Stripe liveLambda produccion (CDK deploy) + Cloud Run produccion (Terraform) + SSL + dominio + Stripe en vivo

9.15 Ops Playbook & Launch Readiness Playbook de Operaciones y Preparacion de Lanzamiento

Runbooks (6 Scenarios) Runbooks (6 Escenarios)

1. API Latency Spike1. Pico de Latencia API

Check Lambda scaling (concurrency, cold starts)Revisar escalado Lambda (concurrencia, cold starts)

Check DynamoDB throttling (read/write capacity)Revisar throttling DynamoDB (capacidad lectura/escritura)

Check Anthropic API latency (their status page)Revisar latencia API Anthropic (su status page)

If Anthropic: enable context pruning, reduce tool countSi Anthropic: habilitar context pruning, reducir cantidad de tools

2. OAuth Token Refresh Failure2. Falla de Refresh Token OAuth

Verify Secret Manager access + token stateVerificar acceso Secret Manager + estado del token

Check adapter logs for specific marketplace errorRevisar logs del adaptador para error especifico del marketplace

Manual re-auth: notify user to reconnect marketplaceRe-auth manual: notificar usuario para reconectar marketplace

If systemic: check marketplace API status pageSi sistemico: revisar status page de API del marketplace

3. Credit Deduction Mismatch3. Discrepancia en Deduccion de Creditos

Audit Stripe webhook events vs clients tableAuditar eventos webhook Stripe vs tabla clients

Check credit_transactions logs for double deductionRevisar logs de credit_transactions por doble deducción

Manual credit adjustment via admin APIAjuste manual de creditos via API admin

4. LLM Hallucination / Wrong Tool4. Alucinacion LLM / Tool Incorrecto

Pull trace from ConversationTrace (DynamoDB) + AgentExecution (PostgreSQL)Extraer trace de ConversationTrace (DynamoDB) + AgentExecution (PostgreSQL)

Analyze: which tool was called, what params, what contextAnalizar: que tool se llamo, que params, que contexto

Fix: adjust tool description or add few-shot exampleFix: ajustar descripcion del tool o agregar few-shot example

Log in #16 Eval Suite as regression testRegistrar en #16 Eval Suite como test de regresion

5. Data Sync DAG Failure5. Falla de DAG de Data Sync

Check Airflow logs for specific DAGRevisar logs Airflow para DAG especifico

Verify marketplace API accessibilityVerificar accesibilidad API del marketplace

Manual DAG re-run via Airflow UIRe-run manual de DAG via UI Airflow

Beautonomous alerts #deploys channel automaticallyBeautonomous alerta canal #deploys automaticamente

6. Electron App Crash6. Crash de App Electron

Check Sentry for crash report + stack traceRevisar Sentry para reporte de crash + stack trace

Check memory usage at crash timeRevisar uso de memoria al momento del crash

If memory: optimize WebContentsView, limit tab countSi memoria: optimizar WebContentsView, limitar cantidad de tabs

Push hotfix via auto-updater (GitHub Releases)Enviar hotfix via auto-updater (GitHub Releases)

Alert Configuration & On-Call Configuracion de Alertas y Guardia

Alert ThresholdsUmbrales de Alerta

p95 latency > 2s → PagerDuty (Mateo)Latencia p95 > 2s → PagerDuty (Mateo)

Error rate > 1% → PagerDuty (Andres)Tasa error > 1% → PagerDuty (Andres)

LLM cost/day > $50 → Slack (Pablo)Costo LLM/día > $50 → Slack (Pablo)

OAuth refresh failure → Slack (Andres)Falla refresh OAuth → Slack (Andres)

DAG failure → Beautonomous → Slack #deploysFalla DAG → Beautonomous → Slack #deploys

On-Call RotationRotacion de Guardia

Week A: Mateo (BE + orchestrator + performance)Semana A: Mateo (BE + orquestador + performance)

Week B: Andres (data + APIs + infrastructure)Semana B: Andres (data + APIs + infraestructura)

Sergio/Pablo: secondary (Electron + product)Sergio/Pablo: secundarios (Electron + producto)

Escalation: 15min ack → 1h resolution targetEscalacion: 15min ack → 1h objetivo de resolucion

Launch Readiness Checklist (Week 10) Checklist de Preparacion de Lanzamiento (Semana 10)

TechnicalTécnico

All unit tests passing (coverage ≥80%)Todos los tests unitarios pasando (cobertura ≥80%)

All E2E tests passing (≥50 tests)Todos los tests E2E pasando (≥50 tests)

API p95 latency <3s under loadLatencia API p95 <3s bajo carga

Electron RAM <500MB with 3 tabsRAM Electron <500MB con 3 tabs

OAuth2 refresh working for all 3 marketplacesRefresh OAuth2 funcionando para los 3 marketplaces

CI/CD pipeline green on main branchPipeline CI/CD verde en branch main

Lambda production deployed and healthy (CDK deploy)Lambda produccion desplegado y saludable (CDK deploy)

Cloud Run Data API production deployed (Terraform)Cloud Run Data API producción desplegado (Terraform)

Rollback tested and verified (<1 min recovery)Rollback testeado y verificado (<1 min recuperacion)

#16 Eval Suite running on every PR#16 Eval Suite corriendo en cada PR

ProductProducto

Beta feedback incorporated (top 5 issues fixed)Feedback beta incorporado (top 5 issues arreglados)

Onboarding flow tested with 3+ non-technical usersFlujo onboarding testeado con 3+ usuarios no técnicos

System prompt iterated based on real conversationsSystem prompt iterado basado en conversaciones reales

ProactiveSuggestionService (after_tool hook) delivering value — verified via test casesProactiveSuggestionService (hook after_tool) entregando valor — verificado via test cases

28+ tools working (10 READ, 8 ANALYSIS operational; 4+ real WRITE in 3 marketplaces). 17 WRITE registered, implemented per circuit breaker28+ tools funcionando (10 READ, 8 ANALYSIS operativos; 4+ WRITE reales en 3 marketplaces). 17 WRITE registradas, implementadas según circuit breaker

#7 Guardrails active (input/output filtering)#7 Guardrails activos (filtrado input/output)

BusinessNegocio

Stripe billing live (Free + Pro + Credit Packs)Billing Stripe en vivo (Free + Pro + Credit Packs)

Download page with .dmg link livePágina de descarga con link .dmg en vivo

Support channel ready (email or Slack community)Canal de soporte listo (email o comunidad Slack)

10+ beta users onboarded and active10+ beta users onboarded y activos

Legal & SecurityLegal y Seguridad

Privacy policy publishedPolitica de privacidad publicada

Terms of service publishedTerminos de servicio publicados

OWASP top 10 security review passedRevision seguridad OWASP top 10 aprobada

Marketplace API compliance verified (MeLi, Amazon, Shopify TOS)Compliance de API de marketplace verificado (TOS MeLi, Amazon, Shopify)

Apple code signing + notarizationCode signing + notarizacion Apple

Beta User Selection & Onboarding Seleccion de Beta Users y Onboarding

Selection CriteriaCriterios de Seleccion

Current Sellerfy users (existing relationship)Usuarios actuales de Sellerfy (relacion existente)

Active on MeLi (primary marketplace)Activos en MeLi (marketplace primario)

Mix: 5 small sellers + 5 medium + 5 largeMix: 5 vendedores pequenos + 5 medianos + 5 grandes

Willing to give feedback (15-min calls)Dispuestos a dar feedback (calls de 15 min)

Mac users (Electron is Mac-only for MVP)Usuarios Mac (Electron es solo Mac para MVP)

Onboarding PlanPlan de Onboarding

2-min video walkthrough + setup documentVideo walkthrough de 2 min + documento de setup

1-on-1 Zoom call for first 5 users (Pablo)Llamada Zoom 1-on-1 para primeros 5 usuarios (Pablo)

Async Slack channel for beta groupCanal Slack async para grupo beta

Feedback form after 48h of usageFormulario de feedback despues de 48h de uso

Follow-up calls at day 3, 7, 14Calls de seguimiento en dia 3, 7, 14

Week 10 Deliverable — The Complete Picture Entregable Semana 10 — El Panorama Completo

A seller of MercadoLibre, Amazon, or Shopify downloads Shopilot.app, connects their account, and can: Un vendedor de MercadoLibre, Amazon o Shopify descarga Shopilot.app, conecta su cuenta, y puede:

1. Talk to a copilot that knows their business (sales, inventory, competitors)Hablar con un copiloto que conoce su negocio (ventas, inventario, competidores)

2. Ask smart questions ("How do I improve this product?", "Who is my competition?")Preguntar cosas inteligentes ("Cómo mejoro este producto?", "Quién es mi competencia?")

3. Execute real actions (edit title, change price, toggle listing) — with confirmationEjecutar acciones reales (editar titulo, cambiar precio, activar/pausar publicacion) — con confirmacion

4. Receive proactive suggestions during conversation ("Your competitor dropped prices — want to reprice?", "5 products need attention")Recibir sugerencias proactivas durante la conversación ("Tu competidor bajó precios — ¿quieres repreciar?", "5 productos necesitan atención")

5. Everything from a native app where they also browse their marketplace with automatic context detectionTodo desde una app nativa donde también navegan su marketplace con detección automática de contexto

That is Shopilot MVP. 4 engineers, 10 weeks, AI-augmented. From 200 paying Sellerfy users to a new product with real sellers using it. Eso es Shopilot MVP. 4 ingenieros, 10 semanas, aumentados con IA. De 200 usuarios pagos de Sellerfy a un nuevo producto con vendedores reales usandolo.

9.16 Workflow & Meetings Flujo de Trabajo y Reuniones

Philosophy Filosofía

Async-first.Async-first. Code speaks louder than meetings.El código habla más fuerte que las reuniones.

Daily communication: Linear task updates + Slack messages. No standups.Comunicación diaria: Actualizaciones de tareas en Linear + mensajes en Slack. Sin standups.

If it can be a Slack message, it IS a Slack message.Si puede ser un mensaje de Slack, ES un mensaje de Slack.

Meetings (called “Coffee” internally) only when they add value.Las reuniones (llamadas “Coffee” internamente) solo cuando agregan valor.

Sprint Coffee Coffee de Sprint

1 Coffee at the start of each sprint cycle1 Coffee al inicio de cada ciclo de sprint

6 total: S1-2, S3-4, S5-6, S7-8, S9-10, S11-126 en total: S1-2, S3-4, S5-6, S7-8, S9-10, S11-12

Plus 1 Kick-off Coffee in S0 (Pre-Sprint alignment)Más 1 Coffee de Kick-off en S0 (alineación Pre-Sprint)

30 minutes max30 minutos máximo

AgendaAgenda

What you’ll do this cycleQué harás este ciclo

What you need from othersQué necesitas de otros

What blocks youQué te bloquea

Format: Google MeetFormato: Google Meet

Total: 7 Sprint CoffeesTotal: 7 Sprint Coffees

Gate Coffee Coffee de Gate

1 Coffee at each Go/No-Go gate1 Coffee en cada gate Go/No-Go

3 total: Gate 1 (end of S4), Gate 2 (end of S8), Launch Gate (end of S10)3 en total: Gate 1 (fin de S4), Gate 2 (fin de S8), Launch Gate (fin de S10)

Pablo is the decision makerPablo es quien decide

FormatFormato

Demo + Go/No-Go voteDemo + votación Go/No-Go

45 minutes max45 minutos máximo

Gate 2 Coffee can overlap with S9-10 Sprint CoffeeEl Coffee de Gate 2 puede superponerse con el Sprint Coffee de S9-10

Ad-hoc Coffee Coffee Ad-hoc

Any engineer can request one at any timeCualquier ingeniero puede solicitar uno en cualquier momento

No rules, but a reason is expectedSin reglas, pero se espera una razón

Typical ReasonsRazones Típicas

Blocked >4hBloqueado >4h

Cross-project dependency needs coordinationDependencia cross-proyecto necesita coordinación

Scope changeCambio de alcance

Architectural decision that affects another engineerDecisión arquitectónica que afecta a otro ingeniero

No bureaucracy — just post in #engineering Slack and schedule.Sin burocracia — solo postea en #engineering en Slack y agenda.

Coffee Calendar — Complete Timeline Calendario de Coffees — Timeline Completo

WeekSemana CoffeeCoffee TypeTipo ParticipantsParticipantes DurationDuración
W0 Kick-offKick-off Sprint All 4Los 4 30min
W1 S1-2 StartInicio S1-2 Sprint All 4Los 4 30min
W3 S3-4 StartInicio S3-4 Sprint All 4Los 4 30min
W4 Gate 1: “It Talks”Gate 1: “Habla” Gate All 4 (Pablo decides)Los 4 (Pablo decide) 45min
W5 S5-6 StartInicio S5-6 Sprint All 4Los 4 30min
W7 S7-8 StartInicio S7-8 Sprint All 4Los 4 30min
W8 Gate 2: “It Acts”Gate 2: “Actúa” Gate All 4 (Pablo decides)Los 4 (Pablo decide) 45min
W9 S9-10 StartInicio S9-10 Sprint All 4Los 4 30min
W10 Launch Gate: “It Ships”Launch Gate: “Se Lanza” Gate All 4 (Pablo decides)Los 4 (Pablo decide) 45min

Total scheduled Coffees: ~9 (7 Sprint + 3 Gate, minus 1 overlap Gate 2 / S9-10). Ad-hoc Coffees are additional. Total de Coffees programados: ~9 (7 Sprint + 3 Gate, menos 1 overlap Gate 2 / S9-10). Los Coffees ad-hoc son adicionales.

Golden Rule Regla de Oro

“If a Coffee can be a Slack message, it’s a Slack message. Respect everyone’s deep work time.” “Si un Coffee puede ser un mensaje de Slack, es un mensaje de Slack. Respeta el tiempo de trabajo profundo de todos.”

9.17 Linear Workspace — Project Management Linear Workspace — Gestión de Proyectos

What Changed: Specs → Linear Qué Cambió: Specs → Linear

Before (v6): Tasks lived as T-codes in markdown files (sprints.md, engineers.md). No real-time tracking, no dependency visualization, no automated workflow.Antes (v6): Las tareas vivían como T-codes en archivos markdown (sprints.md, engineers.md). Sin tracking en tiempo real, sin visualización de dependencias, sin workflow automatizado.

Now (v7): All 192 tasks are Linear issues (AUT-22 to AUT-213). Linear is the single source of truth for execution tracking. This blueprint remains the architectural reference.Ahora (v7): Las 192 tareas son issues de Linear (AUT-22 a AUT-213). Linear es la única fuente de verdad para tracking de ejecución. Este blueprint sigue siendo la referencia arquitectónica.

19 architectural projects remain in this blueprint for technical reference. In Linear, execution is organized into 6 time-bound Projects that group tasks by sprint phase.Los 19 proyectos arquitectónicos permanecen en este blueprint como referencia técnica. En Linear, la ejecución se organiza en 6 Proyectos por tiempo que agrupan tareas por fase de sprint.

Workspace Structure Estructura del Workspace

OrganizationOrganización

Workspace: beautonomous

Team: Shopilot (AUT)

Issues:Issues: 192 (AUT-22 → AUT-213)

Initiative:Iniciativa: MVP Shopilot — Cursor for eCommerce

Team MembersMiembros del Equipo

Mateo — CTO (#2-#9, #11, #18)

Andrés — Data+BE (#10, #12, #14)

Sergio — Full-Stack (#1, #13, #15)

Pablo — CEO (#16, #17, #18, #19)

6 Time-Bound Projects 6 Proyectos por Tiempo

Each Project groups all tasks from a sprint pair. The 19 architectural modules are tracked via Layer/ labels. Cada Proyecto agrupa todas las tareas de un par de sprints. Los 19 módulos arquitectónicos se rastrean vía labels Layer/.

# ProjectProyecto SprintsSprints DatesFechas IssuesIssues PtsPts LeadLead GateGate
P1 Walking Skeleton S0 + S1-2 Mar 11 → Mar 28 48 124 Pablo Gate 0
P2 Core Engines S3-4 Mar 31 → Apr 11 38 94 Mateo Gate 1
P3 WRITE + Billing + Design S5-6 Apr 14 → Apr 25 42 116 Mateo WRITE+Billing
P4 Integration + Polish S7-8 Apr 28 → May 9 37 93 Mateo Gate 2
P5 Production + Launch S9-10 May 12 → May 23 18 52 Pablo Go/No-Go
P6 Buffer S11-12 May 26 → Jun 6 9 34 Pablo Buffer
TotalTotal 192 513

How 19 Architectural Modules Map to 6 Projects Cómo los 19 Módulos Arquitectónicos se Mapean a 6 Proyectos

The 19 modules in this blueprint (Section 8) are architectural — they define what each component does. The 6 Linear projects are temporal — they define when work gets done. A single architectural module contributes tasks to multiple projects across sprints. The bridge is the Layer/ label on each issue. Los 19 módulos en este blueprint (Sección 8) son arquitectónicos — definen qué hace cada componente. Los 6 proyectos de Linear son temporales — definen cuándo se hace el trabajo. Un solo módulo arquitectónico contribuye tareas a múltiples proyectos a través de los sprints. El puente es el label Layer/ en cada issue.

P1: Walking Skeleton (48 issues, 124 pts)

All 7 layers bootstrapped. Electron scaffold (#1), AgentLoop ReAct (#2), KB fix (#9), MeLi/Amazon adapters (#12), OAuth2 + Redis (#14), Beautonomous bootstrap (#17), Figma Foundations (#18), Eval setup (#16). First canary build .dmg+.exe. Las 7 capas arrancadas. Electron scaffold (#1), AgentLoop ReAct (#2), KB fix (#9), MeLi/Amazon adapters (#12), OAuth2 + Redis (#14), Bootstrap Beautonomous (#17), Figma Foundations (#18), Eval setup (#16). Primer build canary .dmg+.exe.

Team: Pablo 14, Mateo 14, Andrés 13, Sergio 7Equipo: Pablo 14, Mateo 14, Andrés 13, Sergio 7

P2: Core Engines (38 issues, 94 pts)

Intelligence layer deepens. ToolRegistry + ToolPolicyFilter (#3), ContextAggregator (#5), 10 READ handlers, ConfirmationFlow (#2), KB incremental + batch (#9), ShopifyAdapter (#12), Data Sync DAGs (#10), Chat UI + OnboardingWizard (#1), Gate 1 signed build. La capa de inteligencia se profundiza. ToolRegistry + ToolPolicyFilter (#3), ContextAggregator (#5), 10 READ handlers, ConfirmationFlow (#2), KB incremental + batch (#9), ShopifyAdapter (#12), Data Sync DAGs (#10), Chat UI + OnboardingWizard (#1), Gate 1 build firmado.

Team: Mateo 15, Andrés 10, Sergio 8, Pablo 5Equipo: Mateo 15, Andrés 10, Sergio 8, Pablo 5

P3: WRITE + Billing + Design (42 issues, 116 pts)

Write capabilities + monetization. WRITE tools phase 1 (#3), InputGuard (#7), Enrichment scaffold (#11), Fast Data Layer + Amazon DAG (#10), BillingView + Stripe + credits (#13), Token pipeline Style Dictionary (#18), Figma Quality Eval extension (#16), KB BigQuery (#9). Capacidades de escritura + monetización. WRITE tools fase 1 (#3), InputGuard (#7), Enrichment scaffold (#11), Fast Data Layer + Amazon DAG (#10), BillingView + Stripe + credits (#13), Token pipeline Style Dictionary (#18), extensión Eval calidad Figma (#16), KB BigQuery (#9).

Team: Mateo 15, Sergio 12, Pablo 9, Andrés 6Equipo: Mateo 15, Sergio 12, Pablo 9, Andrés 6

P4: Integration + Polish (37 issues, 93 pts)

Full integration + quality. WebSocket streaming (#2), OutputGuard (#7), Feedback Loop scaffold (#15), WRITE tools remaining (#3), Staging deploy (#14), Load testing 50 users (#14), WS Electron client (#1), Desktop Build Eval extension (#16), Figma audit (#18), Gate 2 signed build. Integración completa + calidad. WebSocket streaming (#2), OutputGuard (#7), Feedback Loop scaffold (#15), WRITE tools restantes (#3), Deploy staging (#14), Load testing 50 usuarios (#14), WS client Electron (#1), extensión Eval Desktop Build (#16), auditoría Figma (#18), Gate 2 build firmado.

Team: Sergio 13, Pablo 11, Mateo 8, Andrés 5Equipo: Sergio 13, Pablo 11, Mateo 8, Andrés 5

P5: Production + Launch (18 issues, 52 pts)

Ship it. Production deploy AWS+GCP (#14), code signing .dmg + auto-updater (#1), security hardening CSP (#1), LLMGuardChecker + system prompt v3 (#7), Stripe live (#13), beta onboarding 10-15 sellers, OWASP review, Go/No-Go decision. A producción. Deploy producción AWS+GCP (#14), code signing .dmg + auto-updater (#1), security hardening CSP (#1), LLMGuardChecker + system prompt v3 (#7), Stripe live (#13), beta onboarding 10-15 sellers, OWASP review, decisión Go/No-Go.

Team: Pablo 6, Sergio 5, Andrés 4, Mateo 3Equipo: Pablo 6, Sergio 5, Andrés 4, Mateo 3

P6: Buffer (9 issues, 34 pts — all Circuit-Breaker scope)

Deferred + hardening. WRITE tools deferred (#3), p95 optimization (#2), prod hardening + monitoring (#14), adapter bug fixes (#12), beta UI bug fixes (#1), auto-updater S3 pipeline (#1), Windows build (#1), eval expansion (#16), postmortem (#17). Diferidos + hardening. WRITE tools diferidos (#3), optimización p95 (#2), hardening prod + monitoreo (#14), bug fixes adapters (#12), bug fixes UI beta (#1), pipeline auto-updater S3 (#1), Windows build (#1), expansión eval (#16), postmortem (#17).

Team: Sergio 3, Pablo 2, Andrés 2, Mateo 2Equipo: Sergio 3, Pablo 2, Andrés 2, Mateo 2

Total capacity: 513 estimated pts across 192 issues. At ~80 pts/cycle theoretical capacity (4 engineers × 80h), ratio is ~1.5× → Circuit-Breaker label identifies tasks that can be deferred to P6 Buffer. Capacidad total: 513 pts estimados en 192 issues. A ~80 pts/ciclo de capacidad teórica (4 ingenieros × 80h), el ratio es ~1.5× → El label Circuit-Breaker identifica tareas que pueden diferirse al P6 Buffer.

6 Cycles (2 Weeks Each) 6 Ciclos (2 Semanas Cada Uno)

Each Cycle maps 1:1 to a Project with identical dates. Issues are assigned to the cycle matching their sprint. Cada Ciclo mapea 1:1 con un Proyecto con fechas idénticas. Los issues se asignan al ciclo correspondiente a su sprint.

C1

Mar 11-28

S0+S1-2

C2

Mar 31-Apr 11

S3-4

C3

Apr 14-25

S5-6

C4

Apr 28-May 9

S7-8

C5

May 12-23

S9-10

Cool

May 26-Jun 6

S11-12

Config: 2-week cadence • Auto-create OFF • Started/Completed auto-add OFF • Cooldown for deferred tasks only Config: Cadencia 2 semanas • Auto-crear OFF • Auto-agregar Started/Completed OFF • Cooldown solo para tareas diferidas

5 Milestones (Gates) 5 Milestones (Gates)

MilestoneMilestone TargetMeta CriteriaCriterios
Gate 0: APIs Connected Mar 28 MeLi + Amazon OAuth2 working, /context endpoint returns dataMeLi + Amazon OAuth2 funcionando, /context endpoint retorna datos
Gate 1: “It Reads” Apr 11 /conversation returns analysis with real MeLi+Amazon data. Signed build: .dmg notarized + .exe signed/conversation retorna análisis con datos reales MeLi+Amazon. Build firmado: .dmg notarizado + .exe firmado
WRITE + Billing Functional Apr 25 WRITE tools phase 1 working with confirmation flow. Stripe Checkout integrated. Credits gate enforcingWRITE tools fase 1 funcionando con flujo de confirmación. Stripe Checkout integrado. Credits gate aplicando
Gate 2: “It Acts” May 9 WRITE action executed with confirmation + rollback. Full .dmg+.exe with all S8 featuresAcción WRITE ejecutada con confirmación + rollback. Full .dmg+.exe con todas las features S8
Go/No-Go May 23 0 P0 bugs, uptime 99.9%, <50ms p95 adapters. Production build ready for distribution0 P0 bugs, uptime 99.9%, <50ms p95 adapters. Build de producción listo para distribución

Label System Sistema de Labels

Layer/ (7 labels)

Layer/1-Product • Layer/2-Intelligence

Layer/3-Knowledge • Layer/4-Action

Layer/5-Platform • Layer/6-Quality

Layer/7-Internal

Scope/ (3 labels)

MVP-Criticalmust ship for Gate 2debe estar para Gate 2

Importantshould ship, can defer 1 sprintdebería estar, puede diferirse 1 sprint

Circuit-Breakeronly if capacity allowssolo si hay capacidad

Type/ (7 labels)

• Feature • Build • Gate • Eval

• Mockup • Design-Delivery • Setup

SpecialEspeciales (3 labels)

figma-dependencyblocked until Figma deliverybloqueado hasta entrega Figma

cross-teamneeds coordination between engineersnecesita coordinación entre ingenieros

needs-acacceptance criteria pendingcriterios de aceptación pendientes

Workflow & Automations Workflow y Automatizaciones

Issue WorkflowWorkflow de Issues

Backlog Todo In Progress In Review Ready to Merge Done

Won’t Docancelled/out of scopecancelado/fuera de alcance

Deferredmoved to S11-12 buffermovido al buffer S11-12

GitHub AutomationsAutomatizaciones GitHub

PR opened with AUT-XX in branch → issue moves to In ProgressPR abierto con AUT-XX en branch → issue pasa a In Progress

Review requested → issue moves to In ReviewReview solicitado → issue pasa a In Review

PR merged → issue moves to DonePR mergeado → issue pasa a Done

Branch naming: AUT-XX-short-description Naming de branches: AUT-XX-descripcion-corta

Fibonacci Estimates Estimaciones Fibonacci

ScaleEscala

1

4h

2

1d

3

2d

5

3d

8

5d

CapacityCapacidad

4 engineers × 80h/cycle = 320h available4 ingenieros × 80h/ciclo = 320h disponibles

~80 pts/cycle theoretical capacity~80 pts/ciclo capacidad teórica

Ratio: ~1.5× capacity → circuit breaker neededRatio: ~1.5× capacidad → circuit breaker necesario

How Engineers Use Linear Cómo los Ingenieros Usan Linear

1. Open Linear → My Issues to see your assigned tasks for the current cycle.Abrir Linear → My Issues para ver tus tareas asignadas del ciclo actual.

2. Move issue to In Progress when you start working (or just open a PR with AUT-XX in the branch name).Mover issue a In Progress cuando empieces a trabajar (o simplemente abre un PR con AUT-XX en el nombre del branch).

3. Check blocked/blocking relations before starting — don’t start a task if its blocker isn’t done.Revisar relaciones blocked/blocking antes de empezar — no empieces una tarea si su blocker no está listo.

4. When a PR is merged, the issue auto-moves to Done.Cuando un PR se mergea, el issue se mueve automáticamente a Done.

5. Use Cmd+K to navigate quickly. Use Cycles view to see your sprint’s scope.Usar Cmd+K para navegar rápido. Usar vista Cycles para ver el alcance de tu sprint.

Architecture (Blueprint) vs Execution (Linear) Arquitectura (Blueprint) vs Ejecución (Linear)

This Blueprint (19 Projects)Este Blueprint (19 Proyectos)

Architecture reference — what each module does, its components, APIs, data modelsReferencia arquitectónica — qué hace cada módulo, sus componentes, APIs, modelos de datos

Technical specs, deep dives, acceptance criteriaSpecs técnicas, deep dives, criterios de aceptación

Organized by architectural layer (7 layers, 19 projects)Organizado por capa arquitectónica (7 capas, 19 proyectos)

Linear (6 Projects)Linear (6 Proyectos)

Execution tracking — who does what, when, status, blockersTracking de ejecución — quién hace qué, cuándo, estatus, blockers

Real-time progress, GitHub PR integration, Slack notificationsProgreso en tiempo real, integración con GitHub PRs, notificaciones Slack

Organized by time phase (6 projects, 6 cycles, 5 gates)Organizado por fase temporal (6 proyectos, 6 ciclos, 5 gates)

Bridge: Every Linear issue has a Layer/ label mapping it back to its architectural project. Filter by label to see all issues for a specific module. Puente: Cada issue de Linear tiene un label Layer/ que lo mapea a su proyecto arquitectónico. Filtra por label para ver todos los issues de un módulo específico.

Critical Handoffs Handoffs Críticos

These are modeled as blocked/blocking relations in Linear. The blocking issue must be Done before the blocked issue can start. Estos están modelados como relaciones blocked/blocking en Linear. El issue bloqueante debe estar Done antes de que el issue bloqueado pueda empezar.

Engineer → EngineerIngeniero → Ingeniero

• Mateo T1.5 (REST) → Sergio T2.18 (WS client)

• Andrés T1.10 (/context) → Mateo T2.7 (ContextAgg)

• Mateo T3.2 (ConfirmationFlow) → Sergio T3.20 (confirmation UI)

• Sergio T3.24 (credits BE) → Mateo T3.5a (HttpCreditGate)

• Mateo T3.4 (ProactiveSuggestion) → Sergio T3.21 (suggestion cards)

• Mateo T4.1 (WS upgrade) → Sergio T4.10 (WS client)

UX/UI → SergioUX/UI → Sergio

• T0.BB (W1) → T1.19 Tabs+Sidebar + T1.MK1

• T1.BB (W2) → T2.17 Chat UI + T2.MK1/MK2

• T2.BB (S4) → T3.19 BillingView + T3.MK1-3

• T3.BB (S6) → T4.10 WS client + T4.MK1/MK2

• T4.BB (S8) → T5.9 Bug fixes + T5.MK1

10. Full Product Roadmap Roadmap del Producto Completo

MVP

Weeks 1-10 — Core ProductSemanas 1-10 — Producto Core

MercadoLibre + Amazon + Shopify. ~25 primitive tools + autonomous agent. Proactive intelligence. Native shell (Mac). Freemium (Free + Pro $49/mo) + Credit Packs. Beta with 10-15 Sellerfy users.MercadoLibre + Amazon + Shopify. ~25 herramientas primitivas + agente autonomo. Inteligencia proactiva. Shell nativa (Mac). Freemium (Free + Pro $49/mes) + Credit Packs. Beta con 10-15 usuarios Sellerfy.

EXPANSIONEXPANSION

Weeks 11-16 — More Tools + Billing + FeedbackSemanas 11-16 — Mas Herramientas + Billing + Feedback

15+ additional tools (campaigns, inventory reports, image generation). Feedback loop (#15). Business plan ($149/mo, 5K credits). More marketplace regions. 50+ active users.15+ herramientas adicionales (campanias, reportes inventario, generacion imagenes). Feedback loop (#15). Plan Business ($149/mes, 5K creditos). Mas regiones de marketplace. 50+ usuarios activos.

INTELLIGENCEINTELIGENCIA

Weeks 17-22 — AI-Powered FeaturesSemanas 17-22 — Features Potenciados por IA

24/7 trend agents. Multi-agent parallel execution. Advanced proactive strategies. Windows build. Public API. 200+ users.Agentes de tendencias 24/7. Ejecucion multi-agente en paralelo. Estrategias proactivas avanzadas. Build Windows. API publica. 200+ usuarios.

SCALEESCALA

Week 23+ — PlatformSemana 23+ — Plataforma

Multi-language. Marketplace Hub (WooCommerce, eBay, Alibaba). Tool marketplace (third-party tools). Enterprise features. WhatsApp channel. 1,000+ users.Multi-idioma. Marketplace Hub (WooCommerce, eBay, Alibaba). Marketplace de herramientas (tools de terceros). Features enterprise. Canal WhatsApp. 1,000+ usuarios.

11. Monetization & Pricing Monetizacion y Pricing

The Credit Model — Agent Interaction CostsEl Modelo de Creditos — Costos de Interaccion del Agente

Every autonomous interaction has a cost based on tools used + tokens consumed + human-hours equivalent. The agent composes primitive tools into workflows of varying complexity. The goal: the user always perceives so much value that when credits run out, they want to buy more. Cada interaccion autonoma tiene un costo basado en herramientas usadas + tokens consumidos + horas-humano equivalentes. El agente compone herramientas primitivas en flujos de complejidad variable. El objetivo: que el usuario siempre perciba tanto valor que, al quedarse sin creditos, quiera comprar mas.

Example: Agent Interaction CostsEjemplo: Costos de Interaccion del Agente

Agent InteractionInteraccion del AgenteCreditsCreditos
Quick question (get_metrics, simple lookup)Pregunta rapida (get_metrics, consulta simple)1
Autonomous action (write operation with confirmation)Accion autonoma (operacion de escritura con confirmacion)3
Product audit (analyze, compare competitors, propose improvements)Auditoria de producto (analizar, comparar competidores, proponer mejoras)12
Competitor deep-dive (search, compare 10+ competitors, generate report)Analisis profundo competencia (buscar, comparar 10+ competidores, generar reporte)10
Full store optimization (audit all products, prioritize, propose changes)Optimizacion completa tienda (auditar todos los productos, priorizar, proponer cambios)25

Value PerceptionPercepcion de Valor

$49/mo = ~500 credits = hundreds of autonomous agent interactions, equivalent to $2,000-7,000 of manual analyst work.$49/mes = ~500 creditos = cientos de interacciones autonomas del agente, equivalente a $2,000-7,000 de trabajo manual de analista.

  • 500 quick questions or500 preguntas rapidas o
  • 100 product audits or100 auditorias de producto o
  • 20 full store optimizations20 optimizaciones completas de tienda

$49 input → $2,000-$7,000 value output = 40-140x ROI$49 input → $2,000-$7,000 valor output = 40-140x ROI

Pricing TiersPlanes de Precio v2.1

Freemium model: low-friction onboarding, value-driven upgrade, credit packs for expansion revenue.Modelo freemium: onboarding sin friccion, upgrade por valor, credit packs para expansion revenue.

$0/mo

Free

  • 50 credits/monthcreditos/mes
  • 3 marketplaces (MeLi, Amazon, Shopify)
  • 5 skills (read-onlysolo lectura)
  • No proactive alertsSin alertas proactivas
  • No credit packsSin credit packs

At 100%: everything blocked → upgrade CTAAl 100%: todo bloqueado → CTA upgrade

MVP

$49/mo

Pro

  • 500 credits/monthcreditos/mes
  • 3 marketplaces (MeLi, Amazon, Shopify)
  • 8 tools (READ + WRITEREAD + WRITE)
  • Proactive suggestions (LLM-based)Sugerencias proactivas (basadas en LLM)
  • Credit packs availableCredit packs disponibles
  • Chat supportSoporte por chat

At 80%: alert. At 100%: writes blocked, reads continue → buy pack CTAAl 80%: alerta. Al 100%: escrituras bloqueadas, lecturas siguen → CTA comprar pack

PHASE 2FASE 2

$149/mo

Business

  • 5,000 credits/monthcreditos/mes
  • 3 marketplaces + more regions3 marketplaces + mas regiones
  • All tools + customTodas las tools + custom
  • Full proactive engineMotor proactivo completo
  • Priority support + onboardingSoporte prioritario + onboarding
  • API accessAcceso API

Credit Packs (Pro only, Stripe one-time payment)Credit Packs (solo Pro, pago unico Stripe)

100 cr

$5

$0.050/cr

500 cr

$20

$0.040/cr

1,000 cr

$35

$0.035/cr

Credits added to current billing cycle. Do not roll over.Creditos se suman al ciclo actual. No se acumulan entre meses.

Unit EconomicsUnit Economics

85-92%

Gross marginMargen bruto

2-3 mo

CAC paybackPayback CAC

$0.02-0.08

Cost per creditCosto por credito

40-140x

User ROIROI usuario

MVP Pricing StrategyEstrategia de Pricing MVP v2.1

Launch with Free ($0) + Pro ($49/mo). Free tier lowers acquisition friction — users experience value before paying. Pro unlocks write skills + proactive alerts. Credit packs solve the "what happens when I run out" problem without forcing a plan upgrade. Revenue expansion via packs: est. 15-25% of Pro users buy 1+ pack/month.Lanzar con Free ($0) + Pro ($49/mes). El tier Free reduce friccion de adquisicion — usuarios experimentan valor antes de pagar. Pro desbloquea skills de escritura + alertas proactivas. Credit packs resuelven el "que pasa cuando se me acaban" sin forzar upgrade de plan. Expansion revenue via packs: est. 15-25% de usuarios Pro compran 1+ pack/mes.

12. Team & Execution Equipo y Ejecucion

4 AI-augmented engineers operating at 4-6x each. Effectively 16-24 engineering equivalents. See full team page. 4 ingenieros aumentados con IA operando a 4-6x cada uno. Efectivamente 16-24 ingenieros equivalentes. Ver pagina completa del equipo.

Pablo Estrada CEO & Product Engineer

15-year ecommerce veteran. $15M+ in sales processed. Ships features end-to-end with Claude. Owns product vision, system prompt design, QA, eval suite, and go-to-market.15 anos en ecommerce. $15M+ en ventas procesadas. Shipea features end-to-end con Claude. Dueno de la vision de producto, diseno de system prompt, QA, eval suite y go-to-market.

Mateo Quintero CTO

Owns all architecture decisions. ReAct Orchestrator, Tool Registry, intelligence layer, Cerebro Knowledge Base (Go 1.24 + Vertex AI), caching and cost optimization. Uses AI for code review and rapid prototyping.Dueno de todas las decisiones de arquitectura. Orquestador ReAct, Tool Registry, capa de inteligencia, Cerebro Knowledge Base (Go 1.24 + Vertex AI), caching y optimizacion de costos. Usa IA para code review y prototipado rapido.

Andres Leon Data + Backend

Built the Data Orchestrator from scratch. Owns marketplace adapters (MeLi + Amazon + Shopify), Context Aggregator, TokenManager, Data Sync, and production infrastructure.Construyo el Orquestador de Datos desde cero. Dueno de adaptadores marketplace (MeLi + Amazon + Shopify), Context Aggregator, TokenManager, Data Sync e infraestructura de produccion.

Sergio Murillo Full-Stack

Owns the Native Shell (Electron), React sidebar, all UI/UX, MeLi + Amazon + Shopify detection, WebSocket integrations, billing UI, and app distribution (packaging, signing, auto-updates).Dueno de la Shell Nativa (Electron), sidebar React, todo el UI/UX, deteccion MeLi + Amazon + Shopify, integraciones WebSocket, UI de billing y distribucion de la app (empaquetado, signing, auto-updates).

Why this team can ship in 10+2 weeksPor que este equipo puede shipearlo en 10+2 semanas

65% of backend exists — not starting from zero. Production-tested code migrates directly.65% del backend existe — no arrancamos de cero. Codigo probado en produccion se migra directamente.
Designs are documented — ReAct loop, tool catalog, risk taxonomy.Los disenos estan documentados — ReAct loop, catalogo de tools, taxonomía de riesgo.
4 parallel tracks — each person owns an independent vertical.4 tracks paralelos — cada persona es duena de un vertical independiente.
200 users waiting — Sellerfy users validate demand. Day-1 beta testers.200 usuarios esperando — usuarios de Sellerfy validan la demanda. Beta testers dia 1.

13. Version History Historial de Versiones

CURRENT

v7 — Mar 10, 2026

Final blueprint with Linear workspace fully configured. 192 issues (AUT-22 to AUT-213) created with Fibonacci estimates, labels, dependencies. 6 time-bound Projects (Walking Skeleton through Buffer), 6 Cycles (2 weeks each), 5 Milestones/Gates, 1 Initiative. GitHub org-level integration with workflow automations (PR→In Progress, Review→In Review, Merge→Done). Slack integration. New Section 9.17 documenting the complete Linear methodology. Task count expanded from 147 to 192 with UX/UI pipeline, Mockups, Eval extensions, and code signing builds. Blueprint final con Linear workspace completamente configurado. 192 issues (AUT-22 a AUT-213) creados con estimaciones Fibonacci, labels, dependencias. 6 Proyectos por tiempo (Walking Skeleton hasta Buffer), 6 Ciclos (2 semanas cada uno), 5 Milestones/Gates, 1 Iniciativa. Integración GitHub a nivel organización con automatizaciones de workflow (PR→In Progress, Review→In Review, Merge→Done). Integración Slack. Nueva Sección 9.17 documentando la metodología completa de Linear. Conteo de tareas expandido de 147 a 192 con pipeline UX/UI, Mockups, extensiones Eval, y builds de code signing.

v6 — Mar 4, 2026

Complete rewrite as incremental technical blueprint. 8 deep sections (What is Shopilot, Big Players, What We Have, What We Reuse, Architecture with 7 layers, Project Map, Beautonomous 13-subsection guide, How We Build with 19 project cards). Projects reorganized by architecture layer (1-19), layer-group UX with collapse/filter. Bilingual EN/ES throughout. Dark-mode glass-card design. Reescritura completa como blueprint técnico incremental. 8 secciones profundas (Qué es Shopilot, Grandes Players, Lo Que Tenemos, Lo Que Reutilizamos, Arquitectura con 7 capas, Mapa de Proyectos, Guía Beautonomous 13 subsecciones, Cómo Construiremos con 19 tarjetas de proyecto). Proyectos reorganizados por capa de arquitectura (1-19), UX de grupos por capa con collapse/filtro. Bilingüe EN/ES. Diseño dark-mode glass-card.

v5 — Mar 2-3, 2026

MVP 10-week execution plan with 5 sprint phases, 4 engineer tracks (Mateo/Andres/Sergio/Pablo), week-by-week deliverables matrix, critical path analysis, 9 risk categories, infra cost breakdown, milestones & gates, ops playbook, and cross-project dependency matrix. Monetization model (Free/Pro/Credit Packs). Team & execution page. Plan de ejecución MVP 10 semanas con 5 fases de sprint, 4 tracks de ingeniero (Mateo/Andrés/Sergio/Pablo), matriz de entregables semana por semana, análisis de ruta crítica, 9 categorías de riesgo, desglose de costos de infra, milestones y gates, playbook de operaciones, y matriz de dependencias cross-proyecto. Modelo de monetización (Free/Pro/Credit Packs). Página de equipo y ejecución.

v4 — Mar 2, 2026

13 projects rewritten with real implementation specs. 3 projects eliminated. 4 new projects (#14 DevOps, #16 Eval Suite, #7 Guardrails, #11 Enrichment Layer). Project Implementation Map added. 36 primitive tools documented. 4 QA audit rounds. Per-project changelogs. Sprint plan with integration milestones. 13 proyectos reescritos con specs reales de implementación. 3 proyectos eliminados. 4 proyectos nuevos (#14 DevOps, #16 Eval Suite, #7 Guardrails, #11 Enrichment Layer). Mapa de Implementación agregado. 36 tools primitivas documentadas. 4 rondas de auditoría QA. Changelogs por proyecto. Sprint plan con milestones de integración.

v3 — Feb 27-28, 2026

All 16 projects expanded to CORE level. CTO/PM audit: 7 critical fixes applied. Sidebar navigation + collapsible projects. Contradictions eliminated. Los 16 proyectos expandidos a nivel CORE. Auditoría CTO/PM: 7 correcciones críticas aplicadas. Navegación sidebar + proyectos colapsables. Contradicciones eliminadas.

v2.1 — Feb 27, 2026

Business model update: Freemium ($0, 50cr) + Pro ($49/mo, 500cr) + Credit Packs. Billing project added. Plan-aware personality engine. Actualización modelo de negocio: Freemium ($0, 50cr) + Pro ($49/mes, 500cr) + Credit Packs. Proyecto Billing agregado. Motor de personalidad plan-aware.

v2 — Feb 27, 2026

CTO technical review. Deferred Feedback Loop. Redis Day 1. GCP Secret Manager. WebContentsView. Simplified Skills Engine (direct tool_use). Added Amazon + Shopify to MVP scope. Revisión técnica del CTO. Diferido Feedback Loop. Redis Día 1. GCP Secret Manager. WebContentsView. Skills Engine simplificado (tool_use directo). Agregado Amazon + Shopify al scope del MVP.

v1 — Feb 26, 2026

Initial deep spec. 15 projects. MeLi-only MVP. Complete architecture and data models. Spec profundo inicial. 15 proyectos. MVP solo MeLi. Arquitectura completa y modelos de datos.

14. Design Guide — Brand Decisions + Build Plan Guía de Diseño — Decisiones de Marca + Plan de Construcción

v1.0 · 2026-03

Shopilot has no brand identity yetShopilot no tiene identidad de marca aún

9 design decisions pending → Brand Book does not exist → Technical build blocked until decisions are made9 decisiones de diseño pendientes → Brand Book no existe → Construcción técnica bloqueada hasta que se tomen las decisiones

0/9

decisionsdecisiones

Brand Book

ComponentsComponentes

TRACK A · Pablo

9 Brand Decisions — Must be made before anything else9 Decisiones de Marca — Deben tomarse antes que todo lo demás

0 / 9 decideddecididas

Each decision below has candidate options from the benchmark study. Once made, each answer goes directly into the Brand Book. The research that informs each decision is in the study sections below (§02 Benchmark, §21 Synthesis).Cada decisión tiene opciones candidatas del estudio de benchmark. Una vez tomada, cada respuesta va directamente al Brand Book. La investigación que informa cada decisión está en las secciones de estudio abajo (§02 Benchmark, §21 Síntesis).

D1

Brand Emotion — what feeling does Shopilot own?Emoción de Marca — ¿qué sentimiento posee Shopilot?

This is the single sentence that drives every visual decision. Every color, every radius, every animation derives from this answer.Esta es la oración que guía cada decisión visual. Cada color, cada radio, cada animación deriva de esta respuesta.

Trusted ControlControl Confiable — I trust this agent with my money Competitive EdgeVentaja Competitiva — this makes me win Invisible PowerPoder Invisible — it just works, I don't see it
PENDING
D2

Primary Brand Color — the one color that IS ShopilotColor Primario de Marca — el único color que ES Shopilot

Appears on every button, active state, focus ring, and logo. Choose a color no dominant competitor owns in the Latin American e-commerce tools space.Aparece en cada botón, estado activo, focus ring y logo. Elegir un color que ningún competidor dominante posee en el espacio de herramientas de e-commerce latinoamericano.

#F97316 Orange #6366F1 Indigo #0EA5E9 Sky #10B981 Emerald Custom
PENDING
D3

Background Mode — dark-first or light-first?Modo de Fondo — ¿dark-first o light-first?

Study finding: 11/16 power tools are dark-first. Sellers use Shopilot during working hours alongside marketplaces (which are light). This affects eye fatigue and the "feel" of the sidebar next to the marketplace WebView.Hallazgo del estudio: 11/16 herramientas de poder son dark-first. Los sellers usan Shopilot durante horas de trabajo junto a marketplaces (que son light). Esto afecta la fatiga visual y el "feel" del sidebar junto al WebView del marketplace.

Dark-firstDark-first — Cursor, Linear, Claude Light-firstLight-first — Stripe, HubSpot, Shopify Both from day 1Ambos desde el día 1
PENDING
D4

Typography Pair — UI font + data fontPar Tipográfico — fuente UI + fuente de datos

2 fonts max. Rule: one sans for all text, one monospace for all numbers, prices, percentages, code. The mono font for numbers is functional, not stylistic — it keeps columns stable.Máximo 2 fuentes. Regla: una sans para todo el texto, una mono para todos los números, precios, porcentajes, código. La fuente mono para números es funcional, no estilística — mantiene las columnas estables.

Inter + JetBrains Mono Geist + Geist Mono IBM Plex Sans + Plex Mono Custom (Phase 2+)Custom (Fase 2+)
PENDING
D5

Logo — commission a designer, not AI-generatedLogo — encargar a un diseñador, no generado por IA

Needs to work at 16px (macOS tray icon, favicon) AND at 512px (App Store). Outputs needed: icon.svg, wordmark.svg, logo-dark.svg, logo-light.svg, favicon.ico. Blocked until D1+D2 are decided.Debe funcionar a 16px (ícono del tray macOS, favicon) Y a 512px (App Store). Outputs necesarios: icon.svg, wordmark.svg, logo-dark.svg, logo-light.svg, favicon.ico. Bloqueado hasta que D1+D2 estén decididos.

Wordmark onlySolo wordmark — Linear, Vercel Icon + WordmarkÍcono + Wordmark — Claude, HubSpot Abstract markMarca abstracta — Stripe, Arc
BLOCKED by D1+D2
D6

Border Radius Style — sharp, standard, or rounded?Estilo de Border Radius — ¿sharp, estándar o redondeado?

This single decision changes how the product FEELS more than any color. Sharp = technical precision. Rounded = approachable. The mini buttons above in §02 Benchmark show the real difference.Esta única decisión cambia cómo se SIENTE el producto más que cualquier color. Sharp = precisión técnica. Redondeado = accesible. Los mini botones en §02 Benchmark muestran la diferencia real.

r:2–4px Sharp — Cursor r:6–8px Standard — Linear, HubSpot, Stripe r:12–20px Rounded — Arc
PENDING
D7

Shadow Policy — none, minimal, or soft?Política de Sombras — ¿ninguna, mínima o suave?

Shadows signal "weight". Linear and Vercel have zero shadows — creates a flat, fast, technical feel. Stripe and HubSpot use soft shadows — creates a layered, approachable feel. No middle ground works well.Las sombras señalan "peso". Linear y Vercel tienen cero sombras — crea una sensación plana, rápida y técnica. Stripe y HubSpot usan sombras suaves — crea una sensación de capas y accesibilidad. No hay término medio que funcione bien.

NoneNinguna — Linear, Vercel MinimalMínima — Cursor, Claude SoftSuave — Stripe, HubSpot
PENDING
D8

Semantic Colors — success / warning / error / infoColores Semánticos — éxito / advertencia / error / info

These are functional colors — used in audit log, fraud alerts, status badges, confirmation dialogs. They are NOT brand colors. The standard (green/amber/red/blue) works. The only question is the exact shade — must contrast well on the decided background.Son colores funcionales — usados en audit log, alertas de fraude, badges de estado, diálogos de confirmación. NO son colores de marca. El estándar (verde/amber/rojo/azul) funciona. La única pregunta es el tono exacto — debe contrastar bien en el fondo decidido.

Success Warning Error Info exact shades TBD by D3 backgroundtonos exactos TBD por fondo D3
BLOCKED by D3
D9

UI Voice — how does every label, tooltip, and message sound?Voz de la UI — ¿cómo suenan todos los labels, tooltips y mensajes?

Every word in the UI is the brand speaking. "Save" vs "Save changes" vs "Apply". "Error" vs "Something went wrong" vs "Couldn't save — try again". This is not a technical decision — it's a brand voice decision.Cada palabra en la UI es la marca hablando. "Guardar" vs "Guardar cambios" vs "Aplicar". "Error" vs "Algo salió mal" vs "No se pudo guardar — intentalo de nuevo". No es una decisión técnica — es una decisión de voz de marca.

Direct & humanDirecto y humano — Notion, HubSpot Technical & preciseTécnico y preciso — Linear, Cursor Empowering & confidentEmpoderador y confiado — Stripe, Claude
PENDING
BRAND BOOK

Where it lives & what goes in itDónde vive y qué contiene

Does not exist yetNo existe aún

Created after the 9 decisions above are madeSe crea después de tomar las 9 decisiones de arriba

Will contain (6 outputs)Contendrá (6 outputs)

Brand position — 1 sentencePosición de marca — 1 oraciónfrom D1
Color palette — primary + backgrounds + semanticPaleta de color — primario + fondos + semánticofrom D2+D3+D8
Typography scale — sizes, weights, line heightsEscala tipográfica — tamaños, pesos, alturas de líneafrom D4
Logo files — icon.svg, wordmark.svg (dark+light)Archivos de logo — icon.svg, wordmark.svg (dark+light)from D5
Component principles — radius + shadow policyPrincipios de componentes — política de radio + sombrafrom D6+D7
Voice guidelines — 10 do/don't examplesGuías de voz — 10 ejemplos do/don'tfrom D9

Where to create itDónde crearlo

Option A: Figma file → "Shopilot Brand" (recommended — visual, shareable)Opción A: Archivo Figma → "Shopilot Brand" (recomendado — visual, compartible)

Option B: docs/BRAND_BOOK.md in this repo (fast, version-controlled)Opción B: docs/BRAND_BOOK.md en este repo (rápido, con control de versiones)

TRACK B · CTO

Technical Build — blocked until Track AConstrucción Técnica — bloqueado hasta Track A

Cannot start without brand color + typography + backgroundNo puede empezar sin color de marca + tipografía + fondo

Building tokens without brand decisions = wasted workConstruir tokens sin decisiones de marca = trabajo desperdiciado

After Track A is done, CTO builds in this orderCuando Track A esté listo, CTO construye en este orden

1design-tokens.json → CSS vars → Tailwind config
2Electron window shell — frameless, 70/30 splitShell de ventana Electron — frameless, split 70/30
3Base atoms — Button, Badge, Input, SpinnerÁtomos base — Button, Badge, Input, Spinner
4Coach screen — streaming, tool accordion, confirmationPantalla Coach — streaming, tool accordion, confirmación
5Dashboard — KPI cards, data tables, status barDashboard — KPI cards, tablas de datos, status bar
→ Full technical checklist (15 items) in §22 below → Checklist técnico completo (15 items) en §22 abajo
01

Philosophy — "Warm Precision" Filosofía — "Warm Precision"

The 6 most trusted software products of the AI era converge on a single unwritten visual language. The proposed candidate is "Warm Precision" — not yet decided: warm neutral backgrounds (not pure white), warm near-blacks (not #000), orange/coral accents, premium custom typography, and systematic spacing based on mathematical base units. Los 6 productos de software más confiables de la era de la IA convergen en un único lenguaje visual no escrito. Lo llamamos "Warm Precision": fondos neutros cálidos (no blancos puros), negros cálidos (no #000), acentos naranja/coral, tipografía premium custom y espaciado sistemático basado en unidades base matemáticas.

Warm NeutralsNeutros Cálidos

Backgrounds avoid extremes. Not #fff, not #000. Cream whites (#faf9f5) and warm blacks (#141413) reduce eye fatigue and communicate human warmth — critical for AI products where "coldness" generates distrust. Los fondos evitan extremos. No #fff, no #000. Blancos crema (#faf9f5) y negros cálidos (#141413) reducen fatiga visual y comunican calidez humana — crítico para productos de IA donde la "frialdad" genera desconfianza.

Precision SpacingEspaciado de Precisión

All spacings derive from 1–2 mathematical base units (Cursor: --g=10px, --v=22px; Anthropic: clamp-based fluid scale). No arbitrary pixels. This creates visual rhythm the user feels but doesn't consciously notice. Todos los espaciados derivan de 1–2 unidades base matemáticas (Cursor: --g=10px, --v=22px; Anthropic: escala fluida con clamp). No hay px arbitrarios. Esto crea un ritmo visual que el usuario siente pero no nota conscientemente.

Functional ColorColor Funcional

Color carries meaning. Only 3–4 semantic colors: success (green), warning (amber), error (red), info (blue). Brand accents are reserved for CTAs and critical emphasis — never decoration. Orange only when action is required. El color tiene significado. Solo 3–4 colores semánticos: éxito (verde), warning (amber), error (rojo), info (azul). Los acentos de marca se reservan para CTAs y énfasis crítico — nunca decoración. Naranja solo cuando se requiere acción.

Trust-First TypographyTipografía Trust-First

The best companies invest in custom or premium type. Legibility is non-negotiable. Using generic fonts (Arial, default Inter) communicates lack of care. Text is 70% of the UI in data-centric products — it must earn trust at every size. Las mejores empresas invierten en tipografía custom o premium. La legibilidad no es opcional. Usar fuentes genéricas (Arial, Inter por defecto) comunica falta de cuidado. El texto es el 70% de la UI en productos de datos — debe ganar confianza en cada tamaño.

Why it matters for ShopilotPor qué importa para Shopilot

Shopilot is an AI agent that manages sellers' real money. The UI must communicate trust (not a side project), precision (data is clean and readable), and control (the user feels they can trust the agent's actions). A "Warm Precision" design achieves this better than any vibrant palette or extreme minimalism. Shopilot es un agente de IA que maneja el dinero real de vendedores. La UI debe comunicar confianza (no parece un side project), precisión (los datos se ven limpios y legibles), y control (el usuario siente que puede confiar en las acciones del agente). Un diseño "Warm Precision" logra esto mejor que cualquier paleta vibrante o minimalismo extremo.

02

Benchmark — 8 Products, What Problem Each Solved With Design Benchmark — 8 Productos, Qué Problema Resolvió Cada Uno Con Diseño

Design decisions are never arbitrary. Every color, font, and layout pattern in a world-class product exists because someone was solving a specific problem. This section documents the real reasoning behind each brand — not just the output (hex codes), but the input (the problem and the why). For each brand: problem → key design decision → result. Las decisiones de diseño nunca son arbitrarias. Cada color, fuente y patrón de layout en un producto de clase mundial existe porque alguien estaba resolviendo un problema específico. Esta sección documenta el razonamiento real detrás de cada marca — no solo el output (códigos hex), sino el input (el problema y el por qué). Por cada marca: problema → decisión clave de diseño → resultado.

AN

Anthropic.com + Claude.ai

Save changes
Confirm

#d97757 · r:8px · serif

THE PROBLEMEL PROBLEMA

Anthropic needed to communicate "powerful AI" without triggering the fear response that "cold + blue + robotic" design creates. Every AI competitor (Google, Microsoft, OpenAI) was using blue — the tech-corporate default. The product needed trust, not awe.Anthropic necesitaba comunicar "IA poderosa" sin activar la respuesta de miedo que genera el diseño "frío + azul + robótico". Cada competidor de IA usaba azul — el estándar corporativo tech. El producto necesitaba confianza, no asombro.

KEY DESIGN DECISIONDECISIÓN CLAVE DE DISEÑO

"Clay copper" — inspired by unfired clay and the Anthropocene epoch. Explicitly rejected: blue-shifted darks, futuristic aesthetics, corporate sans. Chose organic warmth (#CC785C) over digital precision. The serif typefaces (Styrene, Tiempos) reinforce the "human intellectual" positioning."Cobre arcilla" — inspirado en arcilla sin cocer y la época del Antropoceno. Rechazaron explícitamente: oscuros azulados, estética futurista, sans corporativa. Eligieron calidez orgánica (#CC785C) sobre precisión digital. Las tipografías serif (Styrene, Tiempos) refuerzan el posicionamiento "intelectual humano".

RESULTRESULTADO

Claude.ai is perceived as "the thoughtful AI" — distinct from GPT's sterile white and Gemini's corporate blue. The cream backgrounds (#faf9f5 light / #141413 dark) create a reading experience that feels more like a book than a software dashboard. Contrast: 19.9:1 AAA — the highest in the benchmark.Claude.ai es percibido como "la IA reflexiva" — distinto del blanco estéril de GPT y el azul corporativo de Gemini. Los fondos crema crean una experiencia de lectura que se siente más como un libro que un dashboard. Contraste: 19.9:1 AAA — el más alto del benchmark.

Exact CSS values extractedValores CSS exactos extraídos

Dark primary: #141413 · warm undertone (R>B)

Light primary: #faf9f5 · faintly toasted cream

Brand copper: #CC785C · logo + selection highlight

UI orange: #d97757 · CTAs, interactive elements

Selection bg: rgba(204,120,92,.5)

Fluid type: clamp(3rem → 5rem) display · clamp(1.125 → 1.25rem) body

Fonts: Styrene A/B (display) · Tiempos Text (body) · JetBrains Mono (code)

Nav height: 4.25rem = 68px

Chat max-width: max-w-3xl (768px) · messages 75ch

Motion: menu 400ms · dropdown 200ms · cubic-bezier(0.4,0,0.2,1)

CU

Cursor IDE

save_changes()
Run

#f54e00 · r:2px · mono

THE PROBLEMEL PROBLEMA

VS Code is functional but neutral — it has no strong personality. When Cursor launched as "the AI IDE", they needed the UI to communicate "this is the next generation of the editor" without alienating developers who are used to a neutral chrome. Too flashy = distrust. Too similar to VS Code = no differentiation.VS Code es funcional pero neutro — no tiene personalidad fuerte. Cuando Cursor lanzó como "el IDE con IA", necesitaban que la UI comunicara "esta es la próxima generación del editor" sin alejar a devs acostumbrados a un chrome neutro. Demasiado llamativo = desconfianza. Demasiado similar a VS Code = sin diferenciación.

KEY DESIGN DECISIONDECISIÓN CLAVE DE DISEÑO

Built a mathematical design system, not a visual one. Everything derives from 2 base units: --g ≈ 10px (grid) and --v ≈ 22px (vertical rhythm). The single bold accent (#f54e00 fire orange) only appears on 3 elements: the active tab, the streaming cursor, and hover states. Pure CSS vars — zero JS for theming. The UI disappears so the code (and AI) can emerge.Construyeron un sistema de diseño matemático, no visual. Todo deriva de 2 unidades base. El único acento audaz (#f54e00 naranja fuego) aparece en solo 3 elementos. CSS vars puras — cero JS para theming. La UI desaparece para que el código (y la IA) emerjan.

RESULTRESULTADO

Cursor feels simultaneously "familiar" (inherits VS Code's neutral density) and "new" (the AI panel integrated into the right side with warm dark #26251e feels like a completely different layer). The mathematical spacing system means everything aligns perfectly even when user content is dynamic. Most important: developers don't feel like they're using a "designed" product.Cursor se siente simultáneamente "familiar" (hereda la densidad neutra de VS Code) y "nuevo" (el panel AI integrado a la derecha con #26251e cálido se siente como una capa completamente diferente). El sistema de espaciado matemático hace que todo esté perfectamente alineado incluso con contenido dinámico.

Exact CSS values (main.css)Valores CSS exactos (main.css)

Base units: --g: calc(10rem/16) ~10px · --v: 1rem*1.4 ~22px

Opacity system: --fg-01 → --fg-100 (every 5% step as hex)

Duration: --duration: .14s · --duration-slow: .25s

Easing: --ease-out-spring: cubic-bezier(.25,1,.5,1)

Text scale: sm: 11px · base: 12px · lg: 13px

Shadows: ultra minimal — 0 0 1rem #00000005 (flyout only)

Breakpoints: 420 · 660 · 768 · 900 · 1140 · 1380px

OS detection: data-os=linux → system font fallback

LI

Linear

Save changes
Save

#5e6ad2 · r:6px · no shadow

THE PROBLEMEL PROBLEMA

Jira was (and is) the default project management tool — and it feels like bureaucracy. Every action has friction. Every screen has visual noise. Loading spinners everywhere. Linear's founders decided that the product's design IS the product's value proposition — speed and clarity are not features, they are the brand.Jira era (y es) la herramienta de gestión de proyectos por defecto — y se siente como burocracia. Cada acción tiene fricción. Cada pantalla tiene ruido visual. Los fundadores de Linear decidieron que el diseño del producto ES la propuesta de valor del producto — velocidad y claridad no son características, son la marca.

KEY DESIGN DECISIONDECISIÓN CLAVE DE DISEÑO

Optimistic UI on every action — no confirmation dialogs for reversible operations, no loading states for local actions. The entire design vocabulary is built on 3 rules: no gradients (gradients = effort = slowness), no decorative shadows (shadows = heavy = bureaucracy), opacity-over-color (derive all grays from the brand color at different opacities, not a separate gray scale). The indigo #5e6ad2 is deliberately calm — not exciting, not urgent, just authoritative.UI optimista en cada acción. El vocabulario de diseño completo se construye sobre 3 reglas: sin gradientes, sin sombras decorativas, opacidad sobre color nuevo. El índigo #5e6ad2 es deliberadamente calmado — no emocionante, no urgente, simplemente autoritativo.

RESULTRESULTADO

Linear is used as a benchmark of "what great software feels like" in every design community. The product grew primarily through word-of-mouth among developers because the experience is demonstrably different. The lesson: design clarity is a growth strategy. Engineers and PMs show it to colleagues as an example of quality.Linear se usa como benchmark de "cómo se siente el software excelente" en cada comunidad de diseño. El producto creció principalmente por word-of-mouth entre desarrolladores porque la experiencia es demostrablemente diferente. La lección: la claridad de diseño es una estrategia de crecimiento.

AR

Arc Browser

Open tab
Go

user color · r:20px pill · SF Pro

THE PROBLEMEL PROBLEMA

Chrome's UI is the most used interface in the world — and it has no identity. It's deliberately invisible. Arc wanted to make the browser a "personal space" that feels different for each user. The challenge: how do you build a browser with a strong personality without imposing one personality on everyone?La UI de Chrome es la interfaz más usada del mundo — y no tiene identidad. Arc quería hacer el browser un "espacio personal" que se siente diferente para cada usuario. El desafío: ¿cómo construyes un browser con personalidad fuerte sin imponer una personalidad a todos?

KEY DESIGN DECISIONDECISIÓN CLAVE DE DISEÑO

User-owned accent color — the product's "brand color" is whatever the user picks. This is the opposite of every other product. The sidebar UI uses macOS system dark (#1c1c1e) as the only fixed base, and everything else derives from the user's color choice. The left sidebar is persistent, not the top bar — inverting 20 years of browser convention.Color de acento del usuario — el "color de marca" del producto es el que elige el usuario. El sidebar usa macOS system dark (#1c1c1e) como única base fija, y todo lo demás deriva del color elegido por el usuario. El sidebar izquierdo es persistente, no la barra superior — invirtiendo 20 años de convención de browsers.

RESULTRESULTADO

Arc screenshots look different on every user's computer — generating enormous organic social sharing. People screenshot their Arc setup like they screenshot their iPhone homescreen. The product's design became its marketing. Relevance for Shopilot: the sidebar-as-primary-chrome pattern is exactly the 70/30 Shopilot split.Las capturas de Arc se ven diferentes en cada computador — generando enorme sharing social orgánico. La gente captura su setup de Arc como capturan su homescreen del iPhone. El diseño del producto se convirtió en su marketing. Relevancia para Shopilot: el patrón sidebar-como-chrome-principal es exactamente el split 70/30 de Shopilot.

ST

Stripe

Pay $49.00
Pay now

#635bff · r:6px · shadow

THE PROBLEMEL PROBLEMA

Stripe handles billions of dollars. Their design problem: financial products are traditionally austere, green (trust/money), and boring — because the assumption is that "serious = colorless". Stripe needed to feel trustworthy AND modern AND developer-friendly at the same time, for three completely different audiences: CTOs, developers, and CFOs.Stripe maneja miles de millones de dólares. Su problema de diseño: los productos financieros son tradicionalmente austeros, verdes y aburridos — porque la suposición es que "serio = sin color". Stripe necesitaba sentirse confiable Y moderno Y amigable para desarrolladores al mismo tiempo, para tres audiencias completamente diferentes: CTOs, devs y CFOs.

KEY DESIGN DECISIONDECISIÓN CLAVE DE DISEÑO

Split personality by surface: the marketing site uses deep navy (#0a2540) + gradient aurora effects + large editorial typography to signal "world-class and premium". The product dashboard is pure white + minimal — so CFOs feel "this is clean and trustworthy". The documentation uses monospace + code samples everywhere — so developers feel "this was built for us". Three audiences, three sub-designs, one brand color: purple #635bff.Personalidad dividida por superficie: el sitio de marketing usa navy profundo + gradientes aurora + tipografía editorial grande para señalar "primera clase y premium". El dashboard del producto es blanco puro + minimal — para que los CFOs sientan "esto es limpio y confiable". La documentación usa mono + code samples — para que los devs sientan "esto fue construido para nosotros". Tres audiencias, tres sub-diseños, un color de marca: violeta #635bff.

RESULTRESULTADO

Stripe's landing page is widely considered the benchmark of "premium product marketing" — it redefined what a fintech company's site should look like. Critical insight for Shopilot: handling money requires design that communicates both precision (clean numbers, clear status) AND trust (not too flashy, nothing decorative near financial data). White space is a feature around numbers.El sitio de Stripe es considerado el benchmark de "marketing de producto premium". Insight crítico para Shopilot: manejar dinero requiere diseño que comunique tanto precisión (números limpios, estado claro) como confianza (nada decorativo cerca de datos financieros). El espacio en blanco es una característica alrededor de los números.

HS

HubSpot Canvas

Create contact
Create

#FF7A59 · r:6px · sans

THE PROBLEMEL PROBLEMA

HubSpot serves non-technical users — sales reps and marketers who are not designers and do not care about design. Their predecessor (Salesforce) was notorious for dense, overwhelming UIs. The risk: building an "impressive" design system that designers love but sales reps find confusing. The audience is the person who hates software complexity.HubSpot sirve a usuarios no técnicos — representantes de ventas y marketers que no son diseñadores y no les importa el diseño. Su predecesor (Salesforce) era famoso por UIs densas y abrumadoras. El riesgo: construir un sistema de diseño "impresionante" que los diseñadores amen pero los representantes de ventas encuentren confuso.

KEY DESIGN DECISIONDECISIÓN CLAVE DE DISEÑO

"Sprocket-right" principle: every component is evaluated by whether it helps the user accomplish the task, not by whether it looks good. Custom typography (HubSpot Sans + HubSpot Serif via Typekit) — but only because generic fonts signal "we don't care". Orange (#FF7A59) is warm and approachable — the opposite of the cold blue of Salesforce. 6px border radii everywhere: soft enough to not feel corporate, tight enough to not feel "cute".Principio "Sprocket-right": cada componente se evalúa por si ayuda al usuario a completar la tarea, no por si se ve bien. Naranja (#FF7A59) es cálido y accesible — lo opuesto al frío azul de Salesforce. Radios de 6px: lo suficientemente suave para no sentirse corporativo, lo suficientemente ajustado para no sentirse "lindo".

RESULT / RELEVANCE FOR SHOPILOTRESULTADO / RELEVANCIA PARA SHOPILOT

HubSpot's user is the closest analog to Shopilot's seller. Both are: non-technical, results-oriented, using the software during their working day (not a "power user" session). The key lesson: density is the enemy. Every piece of data a seller sees should be immediately interpretable. No scanning. No decoding. The number should tell the story.El usuario de HubSpot es el análogo más cercano al seller de Shopilot. Ambos son: no técnicos, orientados a resultados, usan el software durante su jornada. La lección clave: la densidad es el enemigo. Cada dato que ve un seller debe ser interpretable inmediatamente. Sin escanear. Sin decodificar. El número debe contar la historia.

VC

Vercel Geist

Deploy project
Deploy

#ffffff btn · r:6px · Geist

THE PROBLEMEL PROBLEMA

Vercel competes against AWS, Google Cloud, and Heroku. The problem: every cloud platform feels the same — blue, corporate, dense. Vercel's audience (frontend developers) is design-literate and will immediately judge a product's quality by its visual craftsmanship. The product needed to feel like it was built by people who care about craft — because that's who their users are.Vercel compite contra AWS, Google Cloud y Heroku. El problema: cada plataforma cloud se siente igual — azul, corporativa, densa. La audiencia de Vercel (devs frontend) es diseño-literate y juzgará inmediatamente la calidad de un producto por su artesanía visual. El producto necesitaba sentirse construido por personas que se preocupan por el craft — porque eso es lo que son sus usuarios.

KEY DESIGN DECISIONDECISIÓN CLAVE DE DISEÑO

Maximum constraint = maximum differentiation. Pure #000000 dark + #fafafa light — the most extreme contrast possible, no warm undertone, no "personality" color at all. The result is that Vercel looks like a luxury product, not a tech product — like an Apple product page. They built a custom open-source font (Geist) to reinforce the "we invest in craftsmanship" signal. Zero decorative elements. Zero shadows. Negative space as the only visual tool.Constrainment máximo = diferenciación máxima. #000000 puro + #fafafa puro. Sin matiz cálido, sin color de "personalidad". El resultado: Vercel se parece a un producto de lujo, no a un producto tech — como una página de Apple. Construyeron una fuente open-source personalizada (Geist) para reforzar la señal "invertimos en artesanía". Cero elementos decorativos. Cero sombras.

RESULTRESULTADO

Vercel's design became a cultural signal in the frontend community. "Vercel-style" is now shorthand for "brutalist minimalism done premium". The open-source Geist font is downloaded thousands of times per week by developers who want to use it in their own products. Important caveat: this approach requires absolute discipline — one wrong design decision and "minimal" becomes "empty".El diseño de Vercel se convirtió en una señal cultural en la comunidad frontend. "Estilo Vercel" es ahora abreviatura de "minimalismo brutalista hecho premium". Advertencia importante: este enfoque requiere disciplina absoluta — una decisión de diseño incorrecta y "minimal" se convierte en "vacío".

SP

Shopify Polaris

Add product
Save

#008060 · r:4px · neutral

THE PROBLEMEL PROBLEMA

Shopify's admin is used by 2M+ merchants who range from solo entrepreneurs running their first store to enterprise brands managing thousands of SKUs. Their common trait: they are not designers, they are business owners who need to take a specific action (change a price, fulfill an order) in 10 seconds or less. Every second of confusion is a second of lost revenue.El admin de Shopify es usado por 2M+ comerciantes que van desde emprendedores solos con su primera tienda hasta marcas enterprise manejando miles de SKUs. Su rasgo común: no son diseñadores, son dueños de negocios que necesitan tomar una acción específica en 10 segundos o menos.

KEY DESIGN DECISIONDECISIÓN CLAVE DE DISEÑO

Every data visualization rule is merchant-first: show the total first (large, bold, row 1) — not the chart. One insight per visualization, never two. Always provide multiple formats (number + percentage + delta). The color system has one rule: green #008060 = Shopify = money = growth. It only appears when something positive happened. 100% of design decisions are tested against: "does a first-time merchant understand this in under 5 seconds?"Cada regla de visualización de datos es merchant-first: mostrar el total primero (grande, bold, fila 1) — no el gráfico. Un insight por visualización, nunca dos. Siempre proporcionar múltiples formatos. El verde #008060 solo aparece cuando algo positivo sucedió. 100% de las decisiones de diseño se prueban contra: "¿entiende esto un comerciante por primera vez en menos de 5 segundos?"

RESULT / DIRECT LESSON FOR SHOPILOTRESULTADO / LECCIÓN DIRECTA PARA SHOPILOT

Polaris is the most-studied design system for e-commerce tools because it solved the "seller comprehension" problem at scale. Direct lesson for Shopilot: every KPI card, every data table, every chart must pass the "5-second merchant test". If a seller needs to think for more than 5 seconds to understand a data point, the design failed — regardless of how good it looks.Polaris es el sistema de diseño más estudiado para herramientas de e-commerce. Lección directa para Shopilot: cada KPI card, cada tabla de datos, cada gráfico debe pasar el "test del comerciante de 5 segundos". Si un seller necesita pensar más de 5 segundos para entender un dato, el diseño falló.

What Actually Differentiates Each BrandLo Que Realmente Diferencia Cada Marca

Not the hex codes — the why behind the hex codes.No los hex codes — el por qué detrás de los hex codes.

BrandMarca Unique differentiatorDiferenciador único Emotion targetedEmoción objetivo What Shopilot can learnQué puede aprender Shopilot
Anthropic Clay copper that rejected "AI = cold + blue"Cobre arcilla que rechazó "IA = frío + azul" Intellectual trustConfianza intelectual AI products can be warm. Warmth = trust for agents that handle real stakes.Los productos AI pueden ser cálidos. Calidez = confianza para agentes con apuestas reales.
Cursor Mathematical system (2 base units) that makes the UI disappearSistema matemático (2 unidades base) que hace desaparecer la UI Invisible powerPoder invisible --g and --v base units. Sidebar chrome should never compete with the marketplace.Unidades base --g y --v. El chrome del sidebar nunca debe competir con el marketplace.
Linear Design as the product's value proposition, not its wrapperEl diseño como propuesta de valor del producto, no como envoltorio Speed as feelingVelocidad como sensación No confirmation dialogs for reversible coach actions. Optimistic UI where safe.Sin confirmaciones para acciones reversibles del coach. UI optimista donde sea seguro.
Arc Persistent left sidebar as primary chrome (inverted the tab bar convention)Sidebar izquierdo persistente como chrome principal (invirtió la convención del tab bar) Personal ownershipPropiedad personal The 70/30 split with a fixed right sidebar is the same pattern — marketplace is the "browser", sidebar is the "Arc panel".El split 70/30 con sidebar derecho fijo es el mismo patrón — el marketplace es el "browser", el sidebar es el "panel Arc".
Stripe White space as a trust signal around financial dataEspacio en blanco como señal de confianza alrededor de datos financieros Premium reliabilityConfiabilidad premium Nothing decorative near prices, inventory counts, or revenue figures. Data must breathe.Nada decorativo cerca de precios, inventario o ingresos. Los datos deben respirar.
HubSpot "Sprocket-right" — function evaluated from user's task, not designer's taste"Sprocket-right" — función evaluada desde la tarea del usuario, no el gusto del diseñador Approachable competenceCompetencia accesible Sellers are not power users. Every screen passes the "new seller in 10 seconds" test.Los sellers no son power users. Cada pantalla pasa el test "seller nuevo en 10 segundos".
Shopify The "5-second merchant test" applied to every data visualizationEl "test del comerciante de 5 segundos" aplicado a cada visualización de datos Clarity = profitClaridad = ganancia Total first. Number big. Delta visible. One chart = one insight. The dashboard is a decision tool, not a data dump.Total primero. Número grande. Delta visible. Un gráfico = un insight. El dashboard es una herramienta de decisión, no un vertedero de datos.
03

Color — Palette & Semantic Tokens Color — Paleta & Tokens Semánticos

Shopilot Dark Palette (proposed)Paleta Dark Shopilot (propuesta)

BackgroundsFondos

--bg

#0f0e0d

--bg-2

#1a1917

--bg-3

#242220

TextTexto

--text

#f5f3ef

--text-2

#c8c5be

--text-3

#8b8880

--text-4

#5a5855

Brand AccentAcento Marca

--orange

#f97316

--orange-2

#ea6c0a

SemanticSemántico

--green

Success

--amber

Warning

--red

Error

--blue

Info/Analysis

--purple

Ctx/Technical

Color usage rulesReglas de uso del color

🟠 Orange: CTA only · AI actions · new notifications. Never decoration.Solo CTAs · acciones de AI · notificaciones nuevas. Nunca decoración.

🟢 Green: Confirmations · price rising · OK · buy box winning.Confirmaciones · precio subiendo · OK · ganando buy box.

🔴 Red: Errors · blocks · fraud alerts · price vs floor.Errores · bloqueos · alertas fraude · precio vs. floor.

🟡 Amber: Warnings · pending · TTL expiring · intermediate states.Warnings · pendientes · TTL expirando · estados intermedios.

🔵 Blue: ANALYSIS tools (read-only) · contextual info.Tools ANALYSIS (solo lectura) · info contextual.

🟣 Purple: Technical context · tokens · ctx window · system info.Contexto técnico · tokens · ctx window · info de sistema.

Key Pattern: Opacity-based color architecture (from Cursor)Patrón clave: Arquitectura de color basada en opacidad (de Cursor)

Instead of creating new colors for every state, derive all shades from the foreground color at varying opacity: rgba(var(--text-raw), 0.05)0.100.150.20 → ... This ensures all UI states are automatically harmonious and theme-compatible. En vez de crear nuevos colores para cada estado, derivar todos los matices del color foreground a diferente opacidad: rgba(var(--text-raw), 0.05)0.100.150.20 → ... Esto garantiza que todos los estados de la UI sean automáticamente armoniosos y compatibles con el tema.

04

Typography — Scale & Pairing Tipografía — Escala & Pairing

Display / Headings

Styrene A

Geometric · slightly humanist · "squarish f,j,r,t" — technical precision with personality. Geométrica · levemente humanista · "f,j,r,t cuadrados" — precisión técnica con personalidad.

Anthropic use: marketing headlines

Body / UI

Styrene B / Inter
The words that matter most.

Narrower · "gentle with words" · readable at 11–14px · handles dense data tables. Más condensada · "gentil con las palabras" · legible a 11–14px · maneja tablas de datos densas.

Mono / Data

$84.99 → $79.99
update_price B09XYZ
BSR: 654 · BB: 63%

JetBrains Mono — all prices, percentages, tool names, IDs. Tabular figures = scannable data. JetBrains Mono — todos los precios, porcentajes, nombres de tools, IDs. Tabular figures = datos escaneables.

Anthropic complete type system (confirmed data)Sistema tipográfico completo de Anthropic (datos confirmados)

9 font families loaded: AnthropicSans · AnthropicSerif · AnthropicMono (proprietary, emerging) + Copernicus Book/Medium (Galaxie, Chester Jenkins+Kris Sowersby 2009) + StyreneA Regular/Medium + StyreneB Regular/Medium (Berton Hasebe, Commercial Type) + TiemposText Regular/Medium (Klim Type Foundry) + JetBrainsMono (variable TTF)

Marketing roles: Styrene A (headlines) · Styrene B (subheads, nav) · Tiempos Text (body long-form)

Claude.ai product roles: Galaxie Copernicus Book (UI headings) · Styrene B (input, UI labels, 400/500/700) · Tiempos Text (AI response prose — "the AI speaks in editorial serif, not system sans")

Pairing logic: Copernicus+Tiempos share a Kris Sowersby lineage → optical harmony. Styrene provides geometric contrast. Serif = human knowledge/warmth. Sans = the system speaking.

Chat input height: ~300px (deliberate — invites long-form composition, treats user as writer not command-line typist)

Fluid type (confirmed clamp values):

--font-size--display-xxl: clamp(3rem, 2.388rem + 2.612vw, 5rem)   /* 48px → 80px */
--font-size--display-xs:  clamp(1.125rem, 1.087rem + 0.163vw, 1.25rem) /* 18px → 20px */
--font-size--monospace:   clamp(0.875rem, 0.531rem + 1.469vw, 2rem)    /* 14px → 32px */
--site--margin:           clamp(2rem, 1.082rem + 3.918vw, 5rem)         /* 32px → 80px */

Shopilot Type ScaleEscala Tipográfica Shopilot

24px / 700Page Title H1
20px / 700Section Title H2
16px / 600Card Title H3
14px / 400Body default — product descriptions, analysis text, standard prose
12px / 400Labels, metadata, secondary information, form helpers
11px / 400Badges, captions, timestamps, status indicators
10px / 700MICRO LABEL · UPPERCASE
mono / 13px$84.99 · BSR:654 · update_price · B09XYZ123 · 63%
05

Spacing & Grid — Mathematical Base Units Espaciado & Grid — Unidades Base Matemáticas

Inspired by Cursor's two-unit base system (--g for grid, --v for vertical rhythm). Every measurement is a multiple of 4px — no arbitrary values. Inspirado en el sistema de dos unidades base de Cursor (--g para grilla, --v para ritmo vertical). Cada medida es múltiplo de 4px — sin valores arbitrarios.

Spacing Scale

sp-1 = 4pxicon gap
sp-2 = 8pxbadge pad
sp-3 = 12pxbtn small
sp-4 = 16pxcard pad
sp-6 = 24pxpanel pad
sp-8 = 32pxsection gap

Border Radii

r-xs 2px
r-sm 4px
r 6px
r-lg 8px
r-xl 12px
r-2xl 16px

Spacing RulesReglas de Espaciado

  • Use 4, 8, 12, 16, 24, 32 — never 7 or 13Usar 4, 8, 12, 16, 24, 32 — nunca 7 ni 13
  • More space = more conceptual separationMás espacio = más separación conceptual
  • Dense tables: 5–6px row padding. Confirmation panels: 12–16px.Tablas densas: 5–6px padding de fila. Paneles de confirmación: 12–16px.
  • Never padding 2px with border-radius 8pxNunca padding 2px con border-radius 8px
  • No arbitrary pixel valuesSin px arbitrarios
06

UI Components — Live Preview Componentes UI — Preview en Vivo

Buttons

Badges / Tags

✓ OK ⚠ Warning ⛔ Error ANALYSIS ctx: 34% REVERSIBLE EXPIRED

CardsTarjetas

Standard Card

$84.99

Auriculares BT Pro

Success State

Buy Box: 82%

↑ +19pts this week

Alert State

FraudDetector ⚠

Score: 0.73 · Action required

Data Table (Polaris-inspired — Seller-first)Tabla de Datos (inspirada en Polaris — Seller-first)

# Seller Precio BSR Buy Box
1 TechStore MX $75.00 312 82%
3 👤 Tu tienda $84.99 654 63%

Rules: headers 8px uppercase mono · data 9px mono · "me" row highlighted with brand opacity · winner in green · loser in red.Reglas: headers 8px uppercase mono · datos 9px mono · fila "me" resaltada con opacidad de marca · ganador en verde · perdedor en rojo.

07

Motion & Animation Motion & Animación

Golden Rule (from Anthropic Frontend Cookbook)Regla de Oro (del Anthropic Frontend Cookbook)

One orchestrated page-load animation > 10 scattered micro-interactions. Staggered reveals with animation-delay create more delight than scattered micro-interactions. Una animación orquestada de page-load > 10 micro-interacciones dispersas. Los reveals escalonados con animation-delay crean más deleite que las micro-interacciones dispersas.

Duration TokensTokens de Duración

--dur-instant: .08s

--dur-fast: .14s (Cursor)

--dur-normal: .25s (Cursor slow)

--dur-slow: .40s (Anthropic fade)

Menu open: 400ms (Anthropic nav)

Dropdown: 200ms (Anthropic)

Easing Curves

--ease-spring:

cubic-bezier(.25,1,.5,1)

--ease-out:

cubic-bezier(.4,0,.2,1)

↑ Material standard · used by Anthropic cookbook

Key AnimationsAnimaciones Clave

fadeInUp: Y+4px → Y0 · opacity 0→1 · .2s ease-outY+4px → Y0 · opacidad 0→1 · .2s ease-out

thinking-pulse: opacity .4→1→.4 · 1.2s infiniteopacidad .4→1→.4 · 1.2s infinito

stagger: animation-delay: 50ms per itemanimation-delay: 50ms por ítem

Always: prefers-reduced-motion supportSiempre: soporte prefers-reduced-motion

08

Accessibility — WCAG 2.2 AA Minimum Accesibilidad — WCAG 2.2 AA Mínimo

Contrast Ratios (Shopilot palette)Ratios de Contraste (paleta Shopilot)

--text (#f5f3ef) / --bg (#0f0e0d) ~18:1 AAA
--text-2 (#c8c5be) / --bg ~10:1 AAA
--text-3 (#8b8880) / --bg ~5.1:1 AA
Anthropic #141413/#faf9f5 19.9:1 AAA
Anthropic #d97757/#faf9f5 3.7:1 large only

RequirementsRequisitos

:focus-visible with 2px orange outline on all interactive elements:focus-visible con outline naranja 2px en todos los elementos interactivos

Minimum touch target: 44×44pxTarget táctil mínimo: 44×44px

prefers-reduced-motion → all animations disabledprefers-reduced-motion → todas las animaciones desactivadas

Semantic HTML: <button> for actions, <a> for navigationHTML semántico: <button> para acciones, <a> para navegación

Never convey information via color aloneNunca comunicar información solo con color

skip-to-content link (HubSpot pattern)link skip-to-content (patrón HubSpot)

09

Design Tokens — Shopilot CSS Variables Design Tokens — CSS Variables Shopilot

Complete, copy-usable CSS variable system for all Shopilot projects. Dark-mode first. Sistema completo de variables CSS, copiable y listo para usar en todos los proyectos Shopilot. Dark-mode first.

:root {
  /* ── Typography ──────────────────────────────── */
  --font-display: 'Styrene A', 'Fraunces', Georgia, serif;
  --font-body:    'Inter', 'Space Grotesk', system-ui, sans-serif;
  --font-mono:    'JetBrains Mono', 'Fira Code', ui-monospace, monospace;

  /* ── Backgrounds (dark-mode first) ──────────── */
  --bg:   #0f0e0d;   /* warm near-black */
  --bg-2: #1a1917;   /* cards */
  --bg-3: #242220;   /* hover states */
  --bg-4: #2e2c29;   /* active states */

  /* ── Text ────────────────────────────────────── */
  --text:   #f5f3ef;   /* warm near-white */
  --text-2: #c8c5be;   /* secondary */
  --text-3: #8b8880;   /* muted */
  --text-4: #5a5855;   /* placeholder / disabled */

  /* ── Brand accent ────────────────────────────── */
  --orange:   #f97316;   /* primary CTA */
  --orange-2: #ea6c0a;   /* hover */
  --orange-3: rgba(249,115,22,.15);  /* tinted bg */

  /* ── Semantic ────────────────────────────────── */
  --green:  #22c55e;
  --amber:  #f59e0b;
  --red:    #ef4444;
  --blue:   #3b82f6;
  --purple: #a855f7;

  /* ── Borders ─────────────────────────────────── */
  --border:   rgba(255,255,255,.07);
  --border-2: rgba(255,255,255,.12);
  --border-3: rgba(255,255,255,.20);

  /* ── Opacity system (Cursor-inspired) ───────── */
  --fg-05:  rgba(245,243,239,.05);
  --fg-08:  rgba(245,243,239,.08);
  --fg-12:  rgba(245,243,239,.12);
  --fg-20:  rgba(245,243,239,.20);
  --fg-40:  rgba(245,243,239,.40);

  /* ── Radii ───────────────────────────────────── */
  --r-xs:  2px;  --r-sm: 4px;  --r: 6px;
  --r-lg:  8px;  --r-xl: 12px; --r-2xl: 16px;

  /* ── Spacing ─────────────────────────────────── */
  --sp-1: 4px;  --sp-2: 8px;  --sp-3: 12px;
  --sp-4: 16px; --sp-6: 24px; --sp-8: 32px;

  /* ── Motion ──────────────────────────────────── */
  --dur:      .14s;   /* Cursor base */
  --dur-slow: .25s;   /* Cursor slow */
  --ease:     cubic-bezier(.25,1,.5,1);   /* spring */
  --ease-std: cubic-bezier(.4,0,.2,1);    /* Material */

  /* ── Text scale ──────────────────────────────── */
  --text-3xs: 9px;  --text-2xs: 10px; --text-xs:   11px;
  --text-sm:  12px; --text-base: 13px; --text-md:  14px;
  --text-lg:  16px; --text-xl:  20px;  --text-2xl: 24px;
}
10

Shopilot-Specific Patterns Patrones Específicos de Shopilot

ReAct Pattern (Thought → Action → Observation)Patrón ReAct (Pensamiento → Acción → Observación)

Thought:Pensamiento: purple left border · italic header
Action:Acción: blue left border · tool badge · toggleR
Observation:Observación: green left border · summary text

Confirmation Card HierarchyJerarquía del Card de Confirmación

1. Header: action type + REVERSIBLE/IRREVERSIBLE badgeHeader: tipo de acción + badge REVERSIBLE/IRREVERSIBLE

2. Diff: from/to with arrows + semantic colorsDiff: de/hacia con flechas + colores semánticos

3. Impact: bulleted consequencesImpacto: bullets de consecuencias

4. Actions: orange confirm + neutral cancelAcciones: confirmar naranja + cancelar neutro

5. Footer: rollback_token if applicableFooter: rollback_token si aplica

Marketplace Data PanelPanel de Datos de Marketplace

1. Header: marketplace icon + name + status badgeHeader: icono marketplace + nombre + badge estado

2. Main metric: large mono number + semantic deltaMétrica principal: número mono grande + delta semántico

3. Secondary: 2–3 col grid, smaller textSecundarias: grid 2–3 col, texto más pequeño

4. Action: link or button at footerAcción: link o botón al pie del panel

Status Indicator LanguageLenguaje de Indicadores de Estado

Green: active · winning buy box · price OK · no incidentsVerde: activo · ganando buy box · precio OK · sin incidentes
Amber: warning · pending · TTL expiring · at riskAmber: warning · pendiente · TTL expirando · en riesgo
Red: error · fraud · hard block · losing buy boxRojo: error · fraude · hard block · perdiendo buy box
Blue: ANALYSIS (read-only) · info · processingAzul: ANALYSIS (solo lectura) · info · procesando

Do ✓

  • Use mono for ALL numeric values (prices, %, BSR, tokens)Usar mono para TODOS los valores numéricos (precios, %, BSR, tokens)
  • Keep high density in tables — sellers are power usersMantener alta densidad en tablas — los sellers son power users
  • Always show the "reason" behind every agent actionMostrar siempre el "motivo" detrás de cada acción del agente
  • Use opacity-based colors for panel backgroundsUsar colores basados en opacidad para fondos de panel
  • More critical = more space + more contrastMás crítico = más espacio + más contraste

Don't ✗

  • Brand color gradients as UI backgroundsGradientes de color de marca como fondos de UI
  • Decorative shadows (flyout/modal only)Sombras decorativas (solo flyout/modal)
  • More than 3 semantic colors in one panelMás de 3 colores semánticos en un mismo panel
  • Font size < 9px for any interactive textTamaño < 9px para texto interactivo
  • Animations > .4s in workflow flowsAnimaciones > .4s en flujos de trabajo
  • Orange accent on more than 1 element per screenAcento naranja en más de 1 elemento por pantalla
11

Design Stack & Toolchain Design Stack & Toolchain

Shopilot's design stack is intentionally lean. The key insight: this HTML spec WILL BE the foundation of the design system once brand decisions are made. No Figma required until Phase 2. El design stack de Shopilot es intencionalmente lean. La clave: este HTML spec ES el sistema de diseño en v1. No se necesita Figma hasta la Fase 2.

4-Level StackStack de 4 Niveles

L1

Spec Layer — This HTML + MarkdownCapa Spec — Este HTML + Markdown

The spec is the design. Cursor AI reads this HTML and generates components that match exactly. Validated through code review. Zero ambiguity vs. Figma handoff.La spec es el diseño. Cursor AI lee este HTML y genera componentes que coinciden exactamente. Validado a través de code review. Cero ambigüedad vs. handoff de Figma.

L2

Token Layer — design-tokens.json → Style Dictionary → CSS :rootCapa Token — design-tokens.json → Style Dictionary → CSS :root

Single source of truth for all values. Style Dictionary transforms JSON → CSS custom properties + tailwind.config.js. One change propagates everywhere.Única fuente de verdad para todos los valores. Style Dictionary transforma JSON → propiedades CSS custom + tailwind.config.js. Un cambio se propaga en todas partes.

L3

Component Layer — React + Tailwind + Figma MCPCapa Componente — React + Tailwind + Figma MCP

All components are defined in Figma (#18 Design System) following Atomic Design. Claude reads the Figma via Figma MCP and implements matching React components. No components are created outside of what is defined in the Figma.Todos los componentes están definidos en Figma (#18 Design System) siguiendo Atomic Design. Claude lee el Figma via Figma MCP e implementa componentes React que coinciden. No se crean componentes fuera de lo definido en el Figma.

L4

QA Layer — Figma ↔ Code visual consistency + PR gatesCapa QA — Consistencia visual Figma ↔ Código + gates de PR

Every PR is reviewed against the Figma source of truth. Visual diff between Figma spec and implemented component is verified during code review. Blocks merge if component deviates from Figma definition.Cada PR se revisa contra la fuente de verdad en Figma. El diff visual entre spec de Figma y componente implementado se verifica durante code review. Bloquea merge si el componente se desvía de la definición en Figma.

Tool × Purpose × When to Use × AlternativeHerramienta × Propósito × Cuándo Usar × Alternativa

ToolHerramienta PurposePropósito WhenCuándo AlternativeAlternativa
Style DictionaryToken transformTransform tokensPhase 1+Fase 1+Theo, vanilla-extract
Tailwind CSSUtility stylingEstilos utilityAlwaysSiempreUnoCSS, vanilla CSS
Figma MCPComponent source of truthFuente de verdad de componentesPhase 1+Fase 1+
Radix UIAccessible primitivesPrimitivos accesiblesModals, dropdownsModales, dropdownsHeadless UI, Ark
FigmaComponent source of truth (Atomic Design)Fuente de verdad de componentes (Atomic Design)Phase 1+Fase 1+Penpot (OSS)
This HTML specSource of truth v1Fuente de verdad v1Now → Phase 1Ahora → Fase 1

Team Roles in Design SystemRoles del Equipo en Design System

UX/UIFigma design execution (T0.BB–T4.BB each sprint)Ejecución diseño Figma (T0.BB–T4.BB cada sprint)
PabloApproves UX/UI deliveries, design decisions, token namingAprueba entregas UX/UI, decisiones de diseño, naming de tokens
SergioConsumes Figma components, creates React integration MockupsConsume componentes Figma, crea Mockups de integración React
MateoToken pipeline (Style Dictionary), Tailwind configPipeline de tokens (Style Dictionary), config Tailwind
AndrésData viz components, table architectureComponentes data viz, arquitectura de tablas

💡 Key insight: Figma (#18 Design System) is the single source of truth for all visual components, following Atomic Design (atoms, molecules, organisms, templates, pages). Claude reads Figma via Figma MCP and implements matching React components in #1 Native Shell. No components are created outside of what is defined in the Figma.💡 Insight clave: Figma (#18 Design System) es la fuente única de verdad para todos los componentes visuales, siguiendo Atomic Design (átomos, moléculas, organismos, plantillas, páginas). Claude lee Figma via Figma MCP e implementa componentes React en #1 Native Shell. No se crean componentes fuera de lo definido en el Figma.

12

How Cursor Built Its UI — Deep Dive Cómo Cursor Construyó Su UI — Deep Dive

Cursor's UI is the closest analogue to Shopilot: an Electron desktop app with a split pane (native web view left + React panel right). Every pattern they solved, we inherit. Below are the concrete technical decisions and their direct Shopilot equivalents. La UI de Cursor es el análogo más cercano a Shopilot: app Electron desktop con split pane (web view nativo izquierda + panel React derecha). Cada patrón que ellos resolvieron, lo heredamos. A continuación las decisiones técnicas concretas y sus equivalentes directos en Shopilot.

Opacity Color SystemSistema de Color por Opacidad

Instead of hardcoded hex colors, Cursor uses a single base color and derives the full scale via opacity:En lugar de colores hex hardcodeados, Cursor usa un único color base y deriva la escala completa vía opacidad:

--fg-01: hsl(0 0% 100% / .01)
--fg-05: hsl(0 0% 100% / .05)
--fg-10: hsl(0 0% 100% / .10)
--fg-40: hsl(0 0% 100% / .40)
--fg-100: hsl(0 0% 100% / 1)

Result: automatic dark/light theme compatibility. One token set, zero manual overrides.Resultado: compatibilidad automática dark/light. Un set de tokens, cero overrides manuales.

2 Base Units — Everything Derives From Here2 Unidades Base — Todo Deriva de Aquí

--g: 10px /* grid base */
--v: 22px /* vertical rhythm */
padding: calc(var(--g) * 1.5)
gap: var(--g)
line-height: var(--v)

No arbitrary pixel values anywhere. Every spacing value is a multiple or fraction of --g or --v. The visual rhythm emerges automatically.Ningún valor en px arbitrario en ningún lugar. Cada valor de espaciado es múltiplo o fracción de --g o --v. El ritmo visual emerge automáticamente.

WebContentsView Split Pane ArchitectureArquitectura Split Pane con WebContentsView

Cursor renders the code editor via WebContentsView (left 70%) — a native Chromium view embedded in Electron. The AI panel (right 30%) is standard React. Communication via ipcMain/ipcRenderer.Cursor renderiza el editor de código vía WebContentsView (izquierda 70%) — una vista Chromium nativa embebida en Electron. El panel AI (derecha 30%) es React estándar. Comunicación vía ipcMain/ipcRenderer.

→ Shopilot identical: marketplace WebContentsView 70% + React sidebar 30%→ Shopilot idéntico: marketplace WebContentsView 70% + sidebar React 30%

Pure CSS Theming — Zero JSTheming CSS Puro — Cero JS

[data-theme="dark"] {
--bg: #141413;
--accent: #f97316;
}
[data-theme="light"] {
--bg: #faf9f5;
--accent: #ea6c00;
}

Theme switch is a single setAttribute('data-theme') on document.body. No React state, no re-renders, no flash. Instant.El cambio de tema es un solo setAttribute('data-theme') en document.body. Sin React state, sin re-renders, sin flash. Instantáneo.

Status Bar Anatomy — 24px Fixed HeightAnatomía del Status Bar — 24px de Altura Fija

● main · TypeScript · UTF-8 4K tokens · v0.44.11
Height:Altura: 24px
Font:Fuente: JetBrains Mono 11px
BG:Fondo: --fg-03 / surface-2
Left: status + contextestado + contexto
Right: tokens / credits / versiontokens / créditos / versión
Separator:Separador: · (middle dot)

Cursor Pattern → Shopilot Equivalent (12 patterns)Patrón Cursor → Equivalente Shopilot (12 patrones)

Cursor PatternPatrón Cursor Shopilot EquivalentEquivalente Shopilot
Shadow workspaces (background indexing)Marketplace context loaded in background on app startContexto marketplace cargado en background al iniciar
Streaming ghost text (word-by-word)Sidebar word-by-word with fadeIn .08s per tokenSidebar word-by-word con fadeIn .08s por token
Tab bar (files)Tab bar (MeLi · Amazon · Shopify) with marketplace iconsTab bar (MeLi · Amazon · Shopify) con íconos marketplace
Status bar 24pxStatus bar 24px (agent state · credits · model version)Status bar 24px (estado agente · créditos · versión modelo)
Apply/Reject diff blocksConfirmation card with Confirm/Cancel + diff viewConfirmation card con Confirmar/Cancelar + vista diff
Agent loop step indicatorsReAct stream (Thought → Action → Observation)ReAct stream (Pensamiento → Acción → Observación)
Composer context pillsContext bar (SELLER_PROFILE · MARKETPLACE · ASIN loaded)Context bar (SELLER_PROFILE · MARKETPLACE · ASIN cargado)
--fg-XX opacity token scale--sp-fg-XX same pattern with sp- (shopilot) prefix--sp-fg-XX mismo patrón con prefijo sp- (shopilot)
--g + --v base unit system--sp-g: 10px + --sp-v: 22px (identical values)--sp-g: 10px + --sp-v: 22px (valores idénticos)
data-theme CSS architecturedata-theme on body, dark-only in MVPdata-theme en body, dark-only en MVP
Cmd+K command paletteCmd+K focus chat input (identical binding)Cmd+K foco en chat input (binding idéntico)
Frameless titlebar + drag regiontitleBarStyle:'hidden' + -webkit-app-region: dragtitleBarStyle:'hidden' + -webkit-app-region: drag
13

How Claude Code Built Its UI — Terminal → Desktop Cómo Claude Code Construyó Su UI — Terminal → Desktop

Claude Code started as a pure terminal CLI (React + Ink renderer). Its UX decisions — born from constraints — are some of the best in the AI-native category. Shopilot adapts the same mental model for a visual desktop context. Claude Code comenzó como un CLI de terminal puro (React + Ink renderer). Sus decisiones de UX — nacidas de restricciones — son de las mejores de la categoría AI-native. Shopilot adapta el mismo modelo mental para un contexto visual de desktop.

React + Ink Renderer with Cell-DiffReact + Ink Renderer con Cell-Diff

Ink renders React components as terminal cells. On update, only changed cells are repainted — like a virtual DOM for terminal. Zero flicker even with rapid token streaming. Shopilot equivalent: React with React.memo + virtualized list for chat history.Ink renderiza componentes React como celdas de terminal. En actualización, solo las celdas cambiadas se repintan — como un virtual DOM para terminal. Cero parpadeo incluso con streaming rápido de tokens. Equivalente Shopilot: React con React.memo + lista virtualizada para historial de chat.

No Alternate Screen — Always ScrollableSin Alternate Screen — Siempre Scrollable

Claude Code deliberately avoids alternate screen mode. All output stays in normal scroll buffer so users can scroll back to review previous turns. This is a key trust decision.Claude Code evita deliberadamente el modo alternate screen. Todo el output permanece en el buffer de scroll normal para que los usuarios puedan scrollear hacia atrás y revisar turnos anteriores. Esta es una decisión clave de confianza.

→ Shopilot: chat history always scrollable, no pagination, no "clear screen"→ Shopilot: historial de chat siempre scrollable, sin paginación, sin "clear screen"

Streaming Cursor ▊ — Not a SpinnerCursor de Streaming ▊ — No un Spinner

Claude Code uses a blinking block cursor to indicate active generation. No spinner, no skeleton, no loading bar. The cursor IS the loading state. Less UI noise = more focus on content.Claude Code usa un cursor de bloque parpadeante para indicar generación activa. Sin spinner, sin skeleton, sin barra de carga. El cursor ES el estado de carga. Menos ruido de UI = más foco en contenido.

The price analysis shows

Tool Call Accordion PatternPatrón Accordion de Tool Calls

Collapsed: tool name + duration badge + status iconColapsado: nombre herramienta + badge duración + ícono estado
Expanded: full JSON input + formatted outputExpandido: JSON input completo + output formateado
Duration shown after completion, not during executionDuración mostrada después de completar, no durante ejecución

→ Shopilot: identical accordion for all 36 tools→ Shopilot: accordion idéntico para las 36 herramientas

Agent Loop Visual — 3 PhasesVisual del Agent Loop — 3 Fases

Gather ContextRecopilar Contexto
Take ActionTomar Acción
Verify ResultsVerificar Resultados
Loop or DoneLoop o Finalizar

Each phase is visually distinct in the chat. Context gathering = blue, Actions = orange, Verification = green. The user always knows what phase the agent is in without needing to read the text.Cada fase es visualmente distinta en el chat. Recopilar contexto = azul, Acciones = naranja, Verificación = verde. El usuario siempre sabe en qué fase está el agente sin necesidad de leer el texto.

Context Compaction + Memory System (deep spec) Context Compaction + Sistema de Memoria (spec profundo)

Context Compaction BannerBanner de Context Compaction

History compacted · 4K tokens preservedHistorial comprimido · 4K tokens preservados auto-dismiss 4s

Appears discretely when context window hits 80%. Blue background, auto-dismiss after 4s. Never interrupts workflow. Shopilot equivalent: same banner + credit usage update in status bar.Aparece discretamente cuando la ventana de contexto alcanza el 80%. Fondo azul, auto-dismiss después de 4s. Nunca interrumpe el flujo. Equivalente Shopilot: mismo banner + actualización de uso de créditos en status bar.

CLAUDE.md → Shopilot EquivalentCLAUDE.md → Equivalente Shopilot

CLAUDE.md: persistent project instructionsinstrucciones persistentes del proyecto
Shopilot:Shopilot: SELLER_PROFILE injected every session
Memory hooks: auto-save important decisionsauto-guardar decisiones importantes
Shopilot:Shopilot: MARKETPLACE_CONTEXT per active tab
14

Complete Component Catalog — Atomic Design Catálogo Completo de Componentes — Atomic Design

Every component Shopilot needs, organized by Atomic Design level. Status: Build = create from scratch · Buy = use library · Done = already in spec. Todos los componentes que Shopilot necesita, organizados por nivel de Atomic Design. Estado: Build = crear desde cero · Buy = usar librería · Done = ya en spec.

ATOMS — Indivisible UI ParticlesATOMS — Partículas UI Indivisibles

ComponentComponente DescriptionDescripción StatusEstado WeekSemana OwnerResponsable
ColorChip12×12px swatch, rounded-sm, border 1pxMuestra 12×12px, rounded-sm, borde 1pxBuildW1Sergio
IconLucide-react, 3 sizes: 12/16/20pxLucide-react, 3 tamaños: 12/16/20pxBuyW1Sergio
Spinner16px, border-2, spin 600ms, 3 size variants16px, border-2, spin 600ms, 3 variantes de tamañoBuildW1Sergio
Divider1px, --fg-10, horizontal and vertical1px, --fg-10, horizontal y verticalBuildW1Sergio
StatusDot8px circle, 4 semantic colors, optional pulseCírculo 8px, 4 colores semánticos, pulso opcionalBuildW1Sergio
AvatarInitials24/32px circle, 2-letter initials, hashed bg colorCírculo 24/32px, iniciales 2 letras, color bg hasheadoBuildW1Sergio
CreditBadgeMono font, number + "cr", color by thresholdFuente mono, número + "cr", color por umbralBuildW2Sergio

AI-NATIVE ATOMS — Unique to AI ProductsAI-NATIVE ATOMS — Exclusivos de Productos AI

ComponentComponente BehaviorComportamiento StatusEstado OwnerResponsable
StreamingCursor ▊Blink 1s infinite, opacity .4→1→.4, no spinnerParpadeo 1s infinito, opacity .4→1→.4, sin spinnerBuildSergio
ThinkingPulse ···3 dots, opacity .4→1→.4, 1.2s staggered3 puntos, opacity .4→1→.4, 1.2s escalonadoBuildSergio
ToolBadgeTool name + state icon + duration. 4 states: queued/running/done/errorNombre tool + ícono estado + duración. 4 estados: queued/running/done/errorBuildSergio
AgentStatusBarBottom 24px, state-driven dot animation + text24px inferior, animación de punto por estado + textoBuildSergio
RiskBadgeA/B/C risk level, color-coded, uppercaseNivel de riesgo A/B/C, codificado por color, mayúsculasBuildSergio
TTLCountdownRemaining time mono, amber below 20%, red below 5%Tiempo restante mono, amber bajo 20%, rojo bajo 5%BuildAndrés

MOLECULES — Composed Interactive UnitsMOLECULES — Unidades Interactivas Compuestas

ComponentComponente VariantsVariantes StatusEstado WeekSemana OwnerResponsable
Buttonprimary · secondary · ghost · destructive · icon · loadingprimario · secundario · ghost · destructivo · ícono · cargandoBuildW1Sergio
Input / Searchtext · search (with icon) · textarea · readonlytexto · búsqueda (con ícono) · textarea · readonlyBuildW1Sergio
Selectsingle · multi · searchable — Radix Selectsimple · múltiple · buscable — Radix SelectBuyW2Sergio
Toggleon/off, 32px wide, smooth transition 200mson/off, 32px ancho, transición suave 200msBuildW2Sergio
Tooltiphover delay 200ms, max-width 200px, Radix Tooltipdelay hover 200ms, max-width 200px, Radix TooltipBuyW2Sergio
ProgressBarlinear, 6px height, animated fill, semantic colorslineal, 6px altura, relleno animado, colores semánticosBuildW2Andrés
TabBarmarketplace tabs, icon + label, active underline 2px orangetabs marketplace, ícono + etiqueta, subrayado activo 2px naranjaBuildW2Sergio
DropdownRadix DropdownMenu, keyboard nav, icons optionalRadix DropdownMenu, nav teclado, íconos opcionalesBuyW2Sergio
KbdShortcut<kbd> styled, Cmd/Ctrl adaptive, monospace<kbd> estilizado, Cmd/Ctrl adaptativo, monospaceBuildW3Sergio

ORGANISMS — Complex, Stateful UI SectionsORGANISMS — Secciones UI Complejas con Estado

ComponentComponente Key BehaviorComportamiento Clave StatusEstado WeekSemana OwnerResponsable
CardStandardglass, border, hover shadow, padding p-4/p-6glass, borde, sombra hover, padding p-4/p-6BuildW1Sergio
DataTablesortable, sticky header, mono numbers, row hoverordenable, header fijo, números mono, hover filaBuildW3Andrés
ConfirmDialogREVERSIBLE/IRREVERSIBLE badge + diff + impact bulletsbadge REVERSIBLE/IRREVERSIBLE + diff + bullets impactoBuildW4Sergio
ToolAccordioncollapsed: badge+name+duration · expanded: full JSONcolapsado: badge+nombre+duración · expandido: JSON completoBuildW4Sergio
ReActStreamThought(purple) → Action(orange) → Observation(green) per turnPensamiento(morado) → Acción(naranja) → Observación(verde) por turnoBuildW5Sergio
ProactiveCardslide-up from bottom, max 2 simultaneous, dismiss swipeslide-up desde abajo, máx 2 simultáneas, dismiss swipeBuildW5Sergio
ContextBarstacked context window bar with legend (project 26)barra de ventana de contexto apilada con leyenda (proyecto 26)DoneAndrés
AuditLogtimeline, mono timestamps, expandable rows, action badgestimeline, timestamps mono, filas expandibles, badges de acciónBuildW6Andrés
RollbackPanelshows rollback_token, before/after diff, one-click restoremuestra rollback_token, diff antes/después, restaurar un clicBuildW7Andrés
FraudAlertred banner, hard block mode, escalation CTAbanner rojo, modo hard block, CTA de escalaciónBuildW7Sergio
MarketplaceKPIlarge mono metric + delta badge + sparkline + secondary gridmétrica mono grande + badge delta + sparkline + grid secundarioBuildW6Andrés
CreditEconomystacked bar + number + upgrade CTA at thresholdsbarra apilada + número + CTA upgrade en umbralesBuildW6Sergio
OnboardingStepstep N/M indicator, progress bar, back/next, skipindicador paso N/M, barra progreso, atrás/siguiente, saltarBuildW8Sergio
EnrollmentCardASIN + marketplace + risk assessment + enroll buttonASIN + marketplace + evaluación de riesgo + botón enrollarBuildW9Andrés
ErrorRecoveryA=amber recoverable · B=red unrecoverable · C=blue infoA=amber recuperable · B=rojo irrecuperable · C=azul infoBuildW5Sergio

TEMPLATES — Full Page LayoutsTEMPLATES — Layouts de Página Completa

ChatView

W4

Dashboard

W6

Settings

W7

Billing

W8

Enrollment

W9

15

Data Visualization for Sellers — 8 Patterns Visualización de Datos para Sellers — 8 Patrones

Sellers are power users. They read numbers professionally. Every data component must prioritize density, precision, and scanability. Golden rule: ALL numbers use JetBrains Mono. No exceptions. Los sellers son power users. Leen números profesionalmente. Cada componente de datos debe priorizar densidad, precisión y escaneabilidad. Regla de oro: TODOS los números usan JetBrains Mono. Sin excepciones.

1 · KPI Metric Card1 · KPI Metric Card

$12,847 ▲ 18.3%
GMV · MercadoLibre · 30d

Large mono number + delta badge (▲green/▼red) + mini sparkline. Zero chart library dependency.Número mono grande + badge delta (▲verde/▼rojo) + mini sparkline. Cero dependencia de librería de charts.

2 · Competitor Table2 · Tabla de Competidores

SellerPriceBB%
You$24.9972%
Seller_A$24.5018%
Seller_B$25.9910%

"You" row highlighted orange. Winner rows green, loser rows red/dim. Sortable columns. Dense = good.Fila "Tú" resaltada naranja. Filas ganadoras verde, perdedoras rojo/dim. Columnas ordenables. Denso = bueno.

3 · Context Window Bar3 · Barra de Ventana de Contexto

SELLER_PROFILE MARKETPLACE CONVERSATION TOOLS

Stacked horizontal bar. Each segment = one context source. Labeled legend below. CSS-only, zero JS dependencies.Barra horizontal apilada. Cada segmento = una fuente de contexto. Leyenda etiquetada abajo. Solo CSS, cero dependencias JS.

4 · Buy Box % Gauge4 · Gauge de Buy Box %

72%
Buy Box
MeLi · 30d
Buy Box
MeLi · 30d

CSS conic-gradient semicircle. No SVG, no chart lib. Color threshold: >60% green · 40-60% amber · <40% red.CSS conic-gradient semicírculo. Sin SVG, sin lib de charts. Umbral de color: >60% verde · 40-60% amber · <40% rojo.

5 · BSR Sparkline (30 points)5 · Sparkline BSR (30 puntos)

30d#2,341

Note: BSR lower = better, so bars trend DOWN is good. Inverted scale. Pure CSS bar sparkline, no Recharts needed.Nota: BSR más bajo = mejor, así que barras en bajada = bueno. Escala invertida. Sparkline de barras CSS puro, sin necesidad de Recharts.

6 · Portfolio Health Grid6 · Grid de Salud del Portfolio

ASIN-001

BB 72%

ASIN-002

BB 42%

ASIN-003

BB 12%

ASIN-004

BB 68%

2×N semaphore grid. Color = health. Click = drill-down. Scales to 50+ ASINs with virtualization.Grid semáforo 2×N. Color = salud. Click = drill-down. Escala a 50+ ASINs con virtualización.

7 · Audit Log Timeline7 · Timeline Audit Log

14:32:01 Price updated $24.99 → $22.49
14:31:47 Competitor scan completedEscaneo de competidores completado

Mono timestamps. Dot-line connector. Expandable JSON on click.Timestamps mono. Conector punto-línea. JSON expandible al hacer click.

8 · Credit Economy Bar8 · Barra de Economía de Créditos

CreditsCréditos 347 / 500
Resets in 18 daysRenueva en 18 días

Green >20% · Amber at 20% · Red at 5% · Modal upgrade at 0%. Never hide the number.Verde >20% · Amber al 20% · Rojo al 5% · Modal upgrade al 0%. Nunca ocultar el número.

Golden Rule: Every Number → JetBrains MonoRegla de Oro: Todo Número → JetBrains Mono

Prices, percentages, BSR rankings, token counts, credit counts, timestamps, ASIN codes, version numbers, durations — ALL use font-family: 'JetBrains Mono'. This creates instant visual scanning: human eye finds numbers automatically when they have a distinct type treatment.Precios, porcentajes, rankings BSR, conteos de tokens, créditos, timestamps, códigos ASIN, números de versión, duraciones — TODOS usan font-family: 'JetBrains Mono'. Esto crea escaneo visual instantáneo: el ojo humano encuentra números automáticamente cuando tienen un tratamiento tipográfico distinto.

16

Electron Desktop Design Patterns Patrones de Diseño para Electron Desktop

Shopilot is a desktop app, not a web app. This distinction has concrete design implications. Every decision below is specific to Electron and cannot be copied from web-only design systems. Shopilot es una app de desktop, no una web app. Esta distinción tiene implicaciones de diseño concretas. Cada decisión a continuación es específica de Electron y no puede copiarse de sistemas de diseño web.

Title Bar — Frameless + Native ButtonsTitle Bar — Sin Marco + Botones Nativos

titleBarStyle: 'hidden'
trafficLightPosition: { x: 12, y: 10 }
.drag-region {
-webkit-app-region: drag;
height: 40px;
}

macOS traffic lights (●●●) appear natively. Tab bar acts as drag region. Interactive elements must have -webkit-app-region: no-drag.Los semáforos de macOS (●●●) aparecen nativamente. La tab bar actúa como región de arrastre. Los elementos interactivos deben tener -webkit-app-region: no-drag.

Split Pane 70/30 — Not Resizable in MVPSplit Pane 70/30 — No Redimensionable en MVP

WebContentsView {
bounds: { x: 0, y: 40,
width: win.width * 0.7,
height: win.height - 64 }
}

70% left = marketplace WebContentsView. 30% right = React sidebar. Fixed split in MVP — no drag-to-resize. Simplifies implementation by 3–4 weeks.70% izquierda = marketplace WebContentsView. 30% derecha = sidebar React. Split fijo en MVP — sin redimensionamiento por arrastre. Simplifica implementación en 3–4 semanas.

Tab Bar — 3 Marketplace TabsTab Bar — 3 Tabs de Marketplace

ML MercadoLibre
AZ Amazon
SP Shopify

Active tab: orange accent background + border. Inactive: dimmed. Tab switch = WebContentsView bounds swap. Cmd+1/2/3 keyboard shortcuts.Tab activo: fondo acento naranja + borde. Inactivo: atenuado. Cambio de tab = intercambio de bounds de WebContentsView. Atajos de teclado Cmd+1/2/3.

System Tray — 3-Level NotificationsSystem Tray — Notificaciones 3 Niveles

L1 In-app banner (amber/red/info) — always firstBanner in-app (amber/rojo/info) — siempre primero
L2 OS push notification — if app in backgroundNotificación push del OS — si la app está en background
L3 Tray badge count — for critical unread alertsBadge de conteo en tray — para alertas críticas sin leer

Tray icon: 16×16 mono SVG, template image (macOS adaptive). Context menu: Open · Pause Agent · Quit.Ícono tray: SVG mono 16×16, template image (adaptativo macOS). Menú contextual: Abrir · Pausar Agente · Salir.

Status Bar Bottom 24px — AnatomyStatus Bar Inferior 24px — Anatomía

MercadoLibre · Idle 347 cr · claude-sonnet-4-6 · v0.1.0

Left side:Lado izquierdo:

  • Animated dot matching agent state (idle/thinking/acting)Punto animado según estado del agente (idle/pensando/actuando)
  • Active marketplace nameNombre del marketplace activo
  • Current agent state labelEtiqueta del estado actual del agente

Right side:Lado derecho:

  • Credit count (color-coded by threshold)Conteo de créditos (codificado por umbral)
  • Active model versionVersión del modelo activo
  • App versionVersión de la app

Keyboard Map — Core ShortcutsMapa de Teclado — Atajos Principales

Cmd+K Focus chat inputFoco en chat input
Cmd+1 MercadoLibre tab
Cmd+2 Amazon tab
Cmd+3 Shopify tab
Esc Cancel in-progress actionCancelar acción en curso
Cmd+, Open settingsAbrir ajustes
Cmd+Enter Confirm action (in dialog)Confirmar acción (en diálogo)
Cmd+Z Trigger rollback (if token available)Activar rollback (si token disponible)

Minimum window size: 900×600px. Below this, show a friendly "resize window" overlay. Never let UI break at any size above 900px.Tamaño mínimo de ventana: 900×600px. Por debajo de esto, mostrar un overlay amigable de "redimensionar ventana". Nunca dejar que la UI se rompa en ningún tamaño por encima de 900px.

17

AI-Native Interaction Patterns Patrones de Interacción AI-Native

These patterns don't exist in traditional design systems. They emerge from the AI agent paradigm: streaming, tool execution, multi-step reasoning, confirmation dialogs for real-world actions, and error recovery specific to LLM behavior. Estos patrones no existen en sistemas de diseño tradicionales. Emergen del paradigma del agente AI: streaming, ejecución de herramientas, razonamiento multi-paso, diálogos de confirmación para acciones del mundo real y recuperación de errores específica del comportamiento LLM.

Agent State MachineMáquina de Estado del Agente

idle
user_typing
submitting
thinking ···
streaming ▊
done
submitting
error
|
streaming ▊
error

Status bar dot color and animation directly reflects state. No extra loading UI needed — the dot IS the state indicator.El color y la animación del punto en el status bar refleja directamente el estado. No se necesita UI de carga adicional — el punto ES el indicador de estado.

Word-by-Word StreamingStreaming Palabra a Palabra

@keyframes fadeInWord {
from { opacity: 0; }
to { opacity: 1; }
}
.word {
animation: fadeInWord .08s ease;
}

Each token appended as a <span class="word">. No skeleton screens, no progressive disclosure — just the text appearing naturally. Mimics human typing.Cada token añadido como <span class="word">. Sin skeleton screens, sin divulgación progresiva — solo el texto apareciendo naturalmente. Imita el tipeo humano.

Thinking Pulse ··· — Never Show Elapsed TimePulso de Pensamiento ··· — Nunca Mostrar Tiempo Transcurrido

@keyframes thinkPulse {
0%, 100% { opacity: .4; }
50% { opacity: 1; }
}
.dot-1 { animation: thinkPulse 1.2s .0s infinite; }
.dot-2 { animation: thinkPulse 1.2s .4s infinite; }
.dot-3 { animation: thinkPulse 1.2s .8s infinite; }

Staggered dots communicate "processing" without anxiety-inducing elapsed time. Never show a counter like "Thinking... 12s". That creates frustration. The wave implies ongoing progress.Puntos escalonados comunican "procesando" sin mostrar tiempo transcurrido que genera ansiedad. Nunca mostrar un contador como "Pensando... 12s". Eso crea frustración. La onda implica progreso continuo.

Tool Stage TransitionsTransiciones de Estado de Tool

queued gray dot, tool name dimmedpunto gris, nombre dimmed
running blue spinning indicator, name whiteindicador azul girando, nombre blanco
success green checkmark + duration badgecheck verde + badge duración
error red X + error type badge + retry linkX roja + badge tipo error + link reintentar

State transitions animated: scale 0.9→1 + opacity 0→1 in 150ms. Never jump — always transition.Transiciones de estado animadas: scale 0.9→1 + opacity 0→1 en 150ms. Nunca saltar — siempre transicionar.

Confirmation Card AnimationAnimación del Card de Confirmación

.confirm-card {
animation: slideUp .25s cubic-bezier(.16,1,.3,1);
}
.backdrop {
animation: fadeIn .15s ease;
}
.diff-row { animation: fadeIn .08s ease; }
/* staggered per row */

Card slides up from below. Backdrop fades in simultaneously. Diff rows appear sequentially (staggered 50ms). Confirms that this is a considered action requiring full attention.El card sube desde abajo. El backdrop aparece simultáneamente. Las filas del diff aparecen secuencialmente (escalonado 50ms). Confirma que es una acción considerada que requiere atención completa.

Error Hierarchy — A · B · CJerarquía de Errores — A · B · C

A

Recoverable ErrorError Recuperable

Amber background. "Try again" or alternative action offered. Examples: API timeout, rate limit, price validation failed. Agent can retry autonomously or with user nudge.Fondo amber. Se ofrece "Reintentar" o acción alternativa. Ejemplos: timeout de API, rate limit, validación de precio fallida. El agente puede reintentar autónomamente o con empuje del usuario.

B

Unrecoverable ErrorError Irrecuperable

Red background. Hard stop. Human escalation required. Examples: fraud detected, marketplace account suspended, rollback failed. Modal, not dismissable without action.Fondo rojo. Parada total. Se requiere escalación humana. Ejemplos: fraude detectado, cuenta marketplace suspendida, rollback fallido. Modal, no se puede descartar sin acción.

C

Informational BlockBloqueo Informativo

Blue background. Non-critical. Context about a limitation. Examples: "This marketplace is in read-only mode", "Feature available in Pro plan". Dismissable, no urgency.Fondo azul. No crítico. Contexto sobre una limitación. Ejemplos: "Este marketplace está en modo solo lectura", "Función disponible en plan Pro". Descartable, sin urgencia.

Proactive Suggestion CardsCards de Sugerencia Proactiva

Appear from bottom via slide-up animationAparecen desde abajo vía animación slide-up

Maximum 2 simultaneously — never moreMáximo 2 simultáneamente — nunca más

Dismiss: click X or swipe rightDescartar: click X o deslizar a la derecha

Auto-dismiss after 30s if no interactionAuto-descartar después de 30s sin interacción

One primary action button (orange)Un botón de acción primaria (naranja)

Credit Warning SystemSistema de Alerta de Créditos

>20% — normal, status bar shows countnormal, status bar muestra conteo
20% — amber warning banner (dismissable)banner warning amber (descartable)
5% — red warning, agent slows downwarning rojo, agente desacelera
0% — upgrade modal, agent pausedmodal upgrade, agente pausado
18

Design → Code Pipeline Pipeline Diseño → Código

How design decisions become production code. The pipeline has 3 phases matching the product lifecycle: Design-in-Code (now), Token-driven (Phase 1), and Figma-backed (Phase 2+). Cómo las decisiones de diseño se convierten en código de producción. El pipeline tiene 3 fases que coinciden con el ciclo de vida del producto: Design-in-Code (ahora), Token-driven (Fase 1) y Figma-backed (Fase 2+).

Now — v1Ahora — v1

Design-in-Code: This HTML = Source of TruthDesign-in-Code: Este HTML = Fuente de Verdad

Every design decision is documented directly in this HTML spec. Cursor AI reads it and generates matching React components. Design reviews happen via PR diffs on this file. Zero Figma dependency.Cada decisión de diseño está documentada directamente en este HTML spec. Cursor AI lo lee y genera componentes React que coinciden. Las revisiones de diseño ocurren vía diffs de PR en este archivo. Cero dependencia de Figma.

design-tokens.json → Style Dictionary → CSS + Tailwind (deep spec) design-tokens.json → Style Dictionary → CSS + Tailwind (spec profundo)

W3C Design Tokens Format (excerpt)Formato W3C Design Tokens (extracto)

{
  "color": {
    "brand": {
      "primary": { "$value": "#f97316", "$type": "color" },
      "meli": { "$value": "#f97316", "$type": "color" },
      "amazon": { "$value": "#f97316", "$type": "color" },
      "shopify": { "$value": "#5c6ac4", "$type": "color" }
    }
  },
  "spacing": {
    "g": { "$value": "10px", "$type": "dimension" },
    "v": { "$value": "22px", "$type": "dimension" }
  }
}

Token Naming ConventionConvención de Naming de Tokens

--sp-color-brand-primary
prefix · category · subcategory · modifier

--sp- prefix prevents collisions with framework tokens. Always 3-4 segments. Kebab-case throughout.El prefijo --sp- previene colisiones con tokens del framework. Siempre 3-4 segmentos. Kebab-case en todo momento.

Figma MCP WorkflowWorkflow de Figma MCP

// Claude reads Figma (#18 Design System) via Figma MCP
// Atomic Design hierarchy in Figma:
//
// Atoms:      Button, Input, Badge, Icon, Label, StatusDot, Avatar
// Molecules:  FormField, SearchBar, NavItem, ChatBubble, TabBar
// Organisms:  Sidebar, Header, CardLayout, Modal, ToolProgress
// Templates:  ChatView, ProfileView, BillingView, EnrollmentView
// Pages:      Full-screen compositions for each Shell view
//
// Workflow:
// 1. External design team creates/updates component in Figma
// 2. Claude reads component spec via Figma MCP
// 3. Claude implements matching React component in #1 Native Shell
// 4. Code review verifies fidelity to Figma spec
// Rule: NO React components outside of what Figma defines

Handoff Checklist (5 items)Checklist de Handoff (5 items)

  • Component exists in Figma (#18) with all states and variantsComponente existe en Figma (#18) con todos los estados y variantes
  • All states shown (default, hover, active, disabled, loading, error)Todos los estados mostrados (default, hover, active, disabled, loading, error)
  • Token values referenced (no hardcoded hex)Valores de tokens referenciados (sin hex hardcodeados)
  • a11y: aria-label, focus ring, keyboard nava11y: aria-label, focus ring, navegación teclado
  • Responsive: works at 900px min widthResponsive: funciona a 900px de ancho mínimo

Design debt tracking: When a PR introduces a visual inconsistency (wrong token, missing state, hardcoded value), add Linear label design-system + comment with exact issue. Never merge and forget — debt compounds fast in AI products where UI is the trust layer.Tracking de deuda de diseño: Cuando un PR introduce una inconsistencia visual (token incorrecto, estado faltante, valor hardcodeado), añadir label Linear design-system + comentario con el issue exacto. Nunca mergear y olvidar — la deuda se acumula rápido en productos AI donde la UI es la capa de confianza.

19

Governance & Scalability Escalabilidad y Gobernanza

How to grow the design system without breaking existing components or creating chaos. Rules are simple enough to remember, strict enough to matter. Cómo hacer crecer el sistema de diseño sin romper componentes existentes ni crear caos. Las reglas son lo suficientemente simples para recordarlas, lo suficientemente estrictas para importar.

Abstraction Rule — 3+Regla de Abstracción — 3+

3 or more uses → abstract into a component.
1–2 uses → inline style is fine. Don't create a component for a one-off. Don't create abstraction anxiety by over-componentizing trivial things.
3 o más usos → abstraer en componente.
1–2 usos → style inline está bien. No crear componente para algo de un solo uso. No crear ansiedad de abstracción con over-componentización de cosas triviales.

New Marketplace = 3 FilesNuevo Marketplace = 3 Archivos

Adding a 4th marketplace requires only:
1 accent color token + 1 logo SVG + 1 URL pattern.
All components, layouts, and patterns work automatically. This is the test of a real design system — extensibility without modification.
Añadir un 4to marketplace requiere solo:
1 token de color acento + 1 logo SVG + 1 patrón URL.
Todos los componentes, layouts y patrones funcionan automáticamente. Este es el test de un sistema de diseño real — extensibilidad sin modificación.

Token VersioningVersionado de Tokens

Each design-tokens.json release is versioned. Non-breaking changes = patch. New tokens = minor. Token renames or value breaks = major. Always deprecate before removing — give 1 sprint lead time.Cada release de design-tokens.json está versionado. Cambios sin breaking = patch. Nuevos tokens = minor. Renombres o cambios de valor = major. Siempre deprecar antes de eliminar — dar 1 sprint de lead time.

Design Review — 2 RolesRevisión de Diseño — 2 Roles

Author: PR description + before/after screenshot + token references listedDescripción de PR + screenshot antes/después + tokens referenciados listados

Reviewer: checks tokens (no hardcoded values) + a11y (aria, contrast) + responsive (900px min)verifica tokens (sin valores hardcodeados) + a11y (aria, contraste) + responsive (900px mín)

Anti-Patterns to AvoidAnti-Patrones a Evitar

Token SoupSopa de Tokens>200 tokens without clear naming structure. Fix: audit tokens quarterly, merge duplicates, enforce naming convention.>200 tokens sin estructura de naming clara. Fix: auditar tokens trimestralmente, fusionar duplicados, aplicar convención de naming.
Component ExplosionExplosión de ComponentesCreating a new component for every slight variation. Fix: use props for variants, not new components.Crear un nuevo componente para cada variación ligera. Fix: usar props para variantes, no componentes nuevos.
Shadow DOM HellWeb components with Shadow DOM for simple UI. Fix: use React + Tailwind exclusively in sidebar, no web components.Web components con Shadow DOM para UI simple. Fix: usar React + Tailwind exclusivamente en sidebar, sin web components.
Inventing ComponentsInventar ComponentesCreating React components that don’t exist in Figma. Fix: every component must trace back to a Figma definition in #18 Design System.Crear componentes React que no existen en Figma. Fix: cada componente debe trazarse a una definición en Figma del #18 Design System.

Maturity Levels L1 → L4Niveles de Madurez L1 → L4

LevelNivel IncludesIncluye When BuiltCuándo Se Construye
L1Tokens + all Atoms + CSS architectureTokens + todos los Atoms + arquitectura CSSPhase 1 Week 1–2Fase 1 Semanas 1–2
L2All Molecules + CardStandard + DataTableTodos los Molecules + CardStandard + DataTablePhase 1 Week 2–4Fase 1 Semanas 2–4
L3All Organisms + AI-native atoms + data vizTodos los Organisms + AI-native atoms + data vizPhase 1 Week 4–10Fase 1 Semanas 4–10
L4All Templates + Figma ↔ Code consistency gates + governance docsTodos los Templates + gates de consistencia Figma ↔ Código + docs de gobernanzaPhase 2+Fase 2+

🔮 Open source future: Publish @shopilot/design-tokens as an npm package when Shopilot has 3+ white-label clients. Token layer is the most transferable part — it's how Shopify Polaris and Atlassian Design System generate revenue beyond their own product.🔮 Futuro open source: Publicar @shopilot/design-tokens como paquete npm cuando Shopilot tenga 3+ clientes white-label. La capa de tokens es la parte más transferible — así es como Shopify Polaris y Atlassian Design System generan ingresos más allá de su propio producto.

20

Design System Roadmap — What to Build When Roadmap del Design System — Qué Construir Cuándo

Delimited to the actual MVP scope: Electron desktop 70/30 split, React sidebar, 36 tools. The 80/20 rule applies: tokens + button + card + badge + data table + confirmation dialog = 80% of the UI. Delimitado al scope real del MVP: Electron desktop split 70/30, sidebar React, 36 herramientas. La regla 80/20 aplica: tokens + button + card + badge + data table + confirmation dialog = 80% de la UI.

Phase 1 · MVP Weeks 1–3 — Core FoundationSemanas 1–3 — Fundación Core

Week 1

  • CSS tokens (--sp-*)
  • All 7 Atoms
  • Button (6 variants)
  • Input / Textarea

Week 2

  • Card + Divider
  • Toggle + Select
  • Tab bar (3 tabs)
  • Status bar 24px

Week 3

  • Tooltip + Dropdown
  • Progress bar
  • AI-native Atoms (6)
  • KbdShortcut
Phase 1 · Coach Weeks 4–6 — AI Interaction LayerSemanas 4–6 — Capa de Interacción AI

Week 4

  • ToolAccordion
  • ConfirmDialog
  • ChatView template
  • Figma MCP integration

Week 5

  • ReActStream
  • ProactiveCard
  • ErrorRecovery A/B/C
  • Credit warning

Week 6

  • MarketplaceKPI
  • CreditEconomy
  • AuditLog
  • Dashboard template
Phase 1 · Data Weeks 7–10 — Data & Operations LayerSemanas 7–10 — Capa de Datos y Operaciones

Week 7–8

  • DataTable (full)
  • RollbackPanel
  • FraudAlert
  • Settings template
  • Billing template

Week 9–10

  • EnrollmentCard
  • OnboardingStep
  • Enrollment template
  • Style Dictionary setup
Phase 2 Polish — Figma Refinement + Token Pipeline + a11yPolish — Refinamiento Figma + Pipeline de Tokens + a11y

Figma component library refinement (full Atomic Design hierarchy). Token export pipeline (Token Studio → Style Dictionary → Tailwind). Design system documentation site. Accessibility audit with axe-core. Visual consistency gates in PR review.Refinamiento de librería de componentes Figma (jerarquía completa Atomic Design). Pipeline de export de tokens (Token Studio → Style Dictionary → Tailwind). Sitio de documentación del design system. Auditoría de accesibilidad con axe-core. Gates de consistencia visual en review de PRs.

Never Build — Use LibrariesNunca Construir — Usar Librerías

ChartsChartsrecharts
Date pickersDate pickersreact-day-picker
Modals (accessible)Modales (accesibles)radix-ui
Virtualized listsListas virtualizadas@tanstack/virtual
Drag and dropDrag and drop@dnd-kit/core
Complex formsFormularios complejosreact-hook-form

80/20 — These 6 Cover 80% of UI80/20 — Estos 6 Cubren el 80% de la UI

CSS Tokensall colors, spacing, typetodos los colores, espaciado, tipo
Button6 variantsvariantes
Cardstandard + glassestándar + glass
Badge / StatusDotsemantic colorscolores semánticos
DataTablesortable, mono numbersordenable, números mono
ConfirmDialogREVERSIBLE / IRREVERSIBLE

Velocity Target & Final Status TableObjetivo de Velocidad & Tabla de Estado Final

Target: 1 component/day (Sergio, Weeks 1–6). At this velocity, the full organism library is complete before the first external user demo. Over-engineering a component takes 3 days minimum. Keep it simple until complexity is needed.Objetivo: 1 componente/día (Sergio, Semanas 1–6). A esta velocidad, la librería completa de organisms está lista antes del primer demo a usuario externo. Un componente over-engineered tarda un mínimo de 3 días. Mantener simple hasta que la complejidad sea necesaria.

ComponentComponente WeekSemana OwnerResponsable Visual StateEstado Visual
CSS Tokens + AtomsW1Mateo + Sergiopending
Button + Input + CardW1–2Sergiopending
TabBar + StatusBar + AI AtomsW2–3Sergiopending
ToolAccordion + ConfirmDialogW4Sergiopending
ReActStream + ProactiveCardW5Sergiopending
KPI + DataTable + AuditLogW6–7Andréspending
Enrollment + OnboardingW9–10Sergio + Andréspending

15. Brand Intelligence Lab — 17 Brand Books + Shopilot Recommendation Brand Intelligence Lab — 17 Brand Books + Recomendación Shopilot

Deep-dive brand books for the 6 reference products + 10 YC-backed startups with similar contexts. Colors, typography, buttons, spacing, motion, voice — everything. Ends with the Shopilot Recommended Brand Book. Brand books a profundidad de los 6 productos de referencia + 10 startups respaldadas por YC con contextos similares. Colores, tipografía, botones, espaciado, motion, voz — todo. Termina con el Brand Book Recomendado de Shopilot.

v1.0 · 2026-03
AN

Anthropic / Claude.ai

AI Safety Company · San Francisco · 2021 · YC Alumni (W21)Empresa de AI Safety · San Francisco · 2021 · Alumni YC (W21)

AI Product

Brand PhilosophyFilosofía de Marca

"AI for human flourishing"

The Anthropic visual language is built around the concept of "clay" — unfired earth, warm, unfinished, human. The brand consciously rejects the cold blue-shifted AI aesthetic (think IBM, Microsoft Azure, early OpenAI). Instead: warmth, earth, copper, organic. The name "Claude" deliberately chosen for its French warmth and humanist connotations. Every color decision reflects: trustworthy AI that feels human, not robotic.El lenguaje visual de Anthropic se construye alrededor del concepto de "arcilla" — tierra sin cocer, cálida, inacabada, humana. La marca rechaza conscientemente la estética AI fría con tono azulado (como IBM, Microsoft Azure, OpenAI inicial). En cambio: calidez, tierra, cobre, orgánico. El nombre "Claude" elegido deliberadamente por su calidez francesa y connotaciones humanistas. Cada decisión de color refleja: AI confiable que se siente humana, no robótica.

Color SystemSistema de Color

#faf9f5

Background Light

RGB 250/249/245 · toasted cream

#141413

Background Dark

RGB 20/20/19 · warm undertone

#CC785C

Brand Copper

Logo · selection · icon

#d97757

UI Orange

CTAs · interactive elements

#6a9bcc

Muted Blue

Secondary · info states

#788c5d

Muted Green

Success · positive states

rgba(204,120,92,.15)

Selection BG

Text selection highlight

#1a1915

Surface Dark

Cards on dark bg

Contrast: #141413 on #faf9f5 = 19.9:1 AAA · #CC785C on #faf9f5 = 5.0:1 AA · #d97757 on #141413 = 6.1:1 AAContraste: #141413 sobre #faf9f5 = 19.9:1 AAA · #CC785C sobre #faf9f5 = 5.0:1 AA · #d97757 sobre #141413 = 6.1:1 AA

Rule: Never use pure black (#000) or pure white (#fff). The warmth delta of ~5 RGB units in each neutral makes everything feel premium vs. commodity.Regla: Nunca usar negro puro (#000) ni blanco puro (#fff). El delta de calidez de ~5 unidades RGB en cada neutro hace que todo se sienta premium vs. commodity.

Typography SystemSistema Tipográfico

RoleFontWeightUsage
Display / HeadlinesStyrene A / Styrene B400–700Hero titles, section headsTítulos hero, encabezados
Editorial / Long-formTiempos Text400 italicBlog, docs, long readsBlog, docs, lectura larga
Product / UI TextStyrene A400–500App UI, labels, bodyUI de app, etiquetas, cuerpo
Code / DataJetBrains Mono400Code blocks, inline codeBloques de código, código inline
Accent / QuoteGalaxie Copernicus300 italicPull quotes, feature textPull quotes, texto destacado

Type scale:Escala tipográfica: display-xxl: clamp(3rem, 5vw, 5rem) · display-lg: clamp(2rem, 3.5vw, 3.5rem) · display-xs: clamp(1.125rem, 1.5vw, 1.25rem) · body: 1rem/1.6

Button SystemSistema de Botones

● Primary: bg #d97757 · text white · radius 8px · padding 10px 20px · font-weight 600

● Secondary: border 1.5px #CC785C/50 · text #CC785C · bg transparent · same radii/padding

Hover: filter: brightness(1.1) — never use a fixed darker hex, keep theming dynamicHover: filter: brightness(1.1) — nunca usar un hex oscuro fijo, mantener el theming dinámico

Spacing · Shadows · Motion · Voice (deep spec) Espaciado · Sombras · Motion · Voz (spec profundo)

SpacingEspaciado

Site margin: clamp(2rem, 5rem)

Nav height: 68px (4.25rem)

Section gap: 96px–160px

Chat max-w: 768px (3xl)

Message max: 75ch

ShadowsSombras

Default: none

Flyout: 0 8px 32px rgba(0,0,0,.12)

Modal: 0 24px 64px rgba(0,0,0,.18)

Focus ring: 0 0 0 3px rgba(204,120,92,.3)

Motion

Menu open: 400ms

Dropdown: 200ms

Tooltip: 150ms

Easing: cubic-bezier(.4,0,.2,1)

Streaming: 0ms delay, instant

Brand VoiceVoz de Marca

Tone adjectivesAdjetivos de tono

Thoughtful · Warm · Honest · Direct · Curious · Humble

Anti-toneAnti-tono

Never: Hype-y · Corporate · Cold · Overpromising · Robotic

Writing styleEstilo de escritura

Conversational but precise. Short sentences. Active voice. Explains "why" not just "what".Conversacional pero preciso. Frases cortas. Voz activa. Explica el "por qué" no solo el "qué".

Shopilot inheritsShopilot hereda

Candidate inspiration: warm copper accent · dark backgrounds · trustworthy AI voiceInspiración candidata: acento cobre cálido · fondos oscuros · voz AI confiable

CU

Cursor IDE

AI-Native Code Editor · Anysphere · 2022 · YC S22Editor de Código AI-Native · Anysphere · 2022 · YC S22

AI-Native IDE

Brand PhilosophyFilosofía de Marca

"The AI-first code editor built for pair programming with AI"

Cursor's brand philosophy is hyper-functional. There is no decorative layer — every visual decision serves the task of writing code. The orange accent (#f54e00) is used only for the critical hot path: the most important action on screen. The warm off-white/off-black background signals "professional tool" vs. "consumer app." The UI is intentionally dense — developers are trained to read dense information quickly.La filosofía de marca de Cursor es híper-funcional. No hay capa decorativa — cada decisión visual sirve a la tarea de escribir código. El acento naranja (#f54e00) se usa solo para el hot path crítico: la acción más importante en pantalla. El fondo off-white/off-black cálido señala "herramienta profesional" vs. "app consumer". La UI es intencionalmente densa — los developers están entrenados para leer información densa rápidamente.

Color SystemSistema de Color

#f7f7f4

--color-theme-bg

Warm off-white

#26251e

--color-theme-fg

Warm off-black

#f54e00

--color-theme-accent

Hot orange · CTAs only

--fg-01 … --fg-100

Opacity Scale

Every 5% step from bg color

Base units: --g: calc(10rem/16) ≈ 10px (grid) · --v: 1.375rem ≈ 22px (vertical rhythm)

Duration: --duration: .14s · --duration-slow: .25s

Easing: --ease-out-spring: cubic-bezier(.25,1,.5,1)

Shadows: Ultra-minimal 0 0 1rem #00000005 — shadows only on flyouts, never on cards

Border radii: 2 · 4 · 8 · 12 · 16px — smallest for inputs, largest for panels

TypographyTipografía

RoleFontSizeNotes
UI Product (sm)System + custom11px (.6875rem)--text-product-sm · labels, status--text-product-sm · etiquetas, estado
UI Product (base)System + custom12px (.75rem)--text-product-base · default text--text-product-base · texto por defecto
UI Product (lg)System + custom13px (.8125rem)--text-product-lg · section titles--text-product-lg · títulos de sección
Code / DataJetBrains Mono12–13pxCode, terminal output, numbersCódigo, salida terminal, números

Note: Cursor uses data-os=linux to switch to system font stack. Respects user's OS font preference — a developer-first accessibility decision.Nota: Cursor usa data-os=linux para cambiar al stack de fuente del sistema. Respeta la preferencia de fuente del OS del usuario — una decisión de accesibilidad developer-first.

Button SystemSistema de Botones

● Primary: bg #f54e00 · radius 6px · padding 8px 16px · font-weight 600 · no border

● Secondary: bg rgba(fff,.07) · border rgba(fff,.12) · radius 4px · font-weight 400

Accent text buttons: color #f54e00 · bg transparent · hover underline onlyBotones de texto acento: color #f54e00 · bg transparent · hover solo subrayado

Rule: orange CTA used ONCE per screen. Second most important action is always ghost.Regla: CTA naranja usado UNA VEZ por pantalla. La segunda acción más importante siempre es ghost.

What Shopilot InheritsQué Hereda Shopilot

Split pane 70/30 · WebContentsView architecture · --g/--v base units · opacity token scale · status bar 24px · ultra-minimal shadows · one orange CTA ruleSplit pane 70/30 · Arquitectura WebContentsView · Unidades base --g/--v · Escala de tokens de opacidad · Status bar 24px · Sombras ultra-mínimas · Regla de un CTA naranja

HS

HubSpot / Canvas Design System

CRM & Marketing Platform · Cambridge MA · 2006 · Public ($HUBS)Plataforma CRM y Marketing · Cambridge MA · 2006 · Pública ($HUBS)

Enterprise SaaS

Brand PhilosophyFilosofía de Marca

"Sprocket-right: interfaces must work for the user, not impress other designers"

HubSpot's Canvas system represents 20 years of B2B SaaS learning. Their core insight: beautiful design at enterprise scale means designing for efficiency and clarity, not aesthetics. Every component is tested against "does this help the user complete their task faster?" The orange brand color (#ff7a00) was chosen for energy, approachability, and differentiation from blue-dominant CRM competitors (Salesforce). Canvas explicitly codifies the philosophy that function precedes form.El sistema Canvas de HubSpot representa 20 años de aprendizaje en SaaS B2B. Su insight principal: diseño hermoso a escala enterprise significa diseñar para eficiencia y claridad, no estética. Cada componente se prueba contra "¿esto ayuda al usuario a completar su tarea más rápido?". El color naranja de marca (#ff7a00) fue elegido por energía, cercanía y diferenciación de los competidores CRM dominados por azul (Salesforce). Canvas codifica explícitamente la filosofía de que la función precede a la forma.

Color SystemSistema de Color

#ffffff

Base White

Primary background

#2D3E50

Midnight Blue

Primary text · headers

#ff7a00

Calypso Orange

Brand · CTAs

#00BDA5

Teal

Success · secondary CTA

#F5C26B

Flax

Warning · alerts

#EAF0F6

Mist Gray

Panel backgrounds

#516F90

Regent Gray

Secondary text

#F2545B

Alizarin

Error · destructive

Typography + ButtonsTipografía + Botones

FontsFuentes

Display: HubSpot Serif (custom, Typekit)

UI: HubSpot Sans (custom, Typekit)

Code: Lucida Console / Courier New (fallback)

Scale: 12 · 14 · 16 · 20 · 24 · 32 · 40 · 48px

Radius: --cl-radius ~6px standard

Icons: SVG fill:currentColor · 2rem default · .cl-icon class

ButtonsBotones

What Shopilot InheritsQué Hereda Shopilot

Merchant-first philosophy · Data table density · Function over aesthetics principle · Multiple semantic colors for different alert types · Sprocket-right thinkingFilosofía merchant-first · Densidad de tablas de datos · Principio función sobre estética · Múltiples colores semánticos para tipos de alerta · Pensamiento Sprocket-right

LI

Linear

Project Management Tool · San Francisco · 2019 · YC W20Herramienta de Gestión de Proyectos · San Francisco · 2019 · YC W20

B2B Productivity

Brand PhilosophyFilosofía de Marca

"Speed is a feature — every interaction must feel instantaneous"

Linear's brand is built on the premise that design debt in productivity tools costs people hours every week. Their aesthetic is extreme minimalism — not because it looks good, but because every unnecessary element steals attention. The indigo brand color (#5e6ad2) was chosen for calm authority: it communicates "serious tool for serious work" without being cold or aggressive. Background Woodsmoke (#1a1a1e) is the darkest of the reference brands — near-black, but slightly purple-shifted for warmth.La marca de Linear se construye sobre la premisa de que la deuda de diseño en herramientas de productividad le cuesta a la gente horas cada semana. Su estética es minimalismo extremo — no porque se vea bien, sino porque cada elemento innecesario roba atención. El color índigo de marca (#5e6ad2) fue elegido por autoridad tranquila: comunica "herramienta seria para trabajo serio" sin ser frío ni agresivo. El fondo Woodsmoke (#1a1a1e) es el más oscuro de las marcas de referencia — casi negro, pero ligeramente desplazado hacia el púrpura para dar calidez.

Color SystemSistema de Color

#1a1a1e

Woodsmoke

Primary bg · dark

#111116

Sidebar BG

Navigation panel

#5e6ad2

Indigo Brand

Logo · selected · CTAs

#8b8fa8

Oslo Gray

Secondary text

#25252a

Surface

Card backgrounds

#2e3035

Hover Surface

Row hover state

#4cb782

Done Green

Completed state

#eb5757

Cancelled Red

Error · blocked state

Design rules:Reglas de diseño: No gradients ever · No decorative shadows · Use opacity over new colors · Border: 1px rgba(255,255,255,.06) onlySin gradients nunca · Sin sombras decorativas · Usar opacidad en lugar de nuevos colores · Borde: solo 1px rgba(255,255,255,.06)

Keyboard-first: every action reachable without mouse. Speed is communicated through interaction, not animation.cada acción alcanzable sin mouse. La velocidad se comunica a través de la interacción, no de la animación.

Typography + ButtonsTipografía + Botones

Display: Inter Display · weights 300 (light) + 700 (bold)

UI: Inter · weights 400/500

Code: JetBrains Mono · 12–13px

Scale: 11 · 12 · 13 · 14 · 16 · 20 · 28 · 40px

Line height: 1.4 UI · 1.6 body

What Shopilot InheritsQué Hereda Shopilot

No gradients / no decorative shadows · Opacity token approach · Keyboard-first mindset · Dark bg with slight warm purple shift · Extreme information density without visual noiseSin gradients / sin sombras decorativas · Enfoque de tokens de opacidad · Mentalidad keyboard-first · Fondo oscuro con ligero tono púrpura cálido · Densidad de información extrema sin ruido visual

VC

Vercel / Geist Design System

Frontend Cloud Platform · San Francisco · 2015 · YC W16Plataforma Cloud Frontend · San Francisco · 2015 · YC W16

Dev Tools

Brand PhilosophyFilosofía de Marca

"Black canvas: dark mode is not a theme, it's the identity"

Vercel's brand is the most radical of the six. Pure black (#000000) as the primary background — not dark navy, not warm dark, pure black. This is intentional: developers live in dark mode, and Vercel wants to be the platform that feels like the best developer tool they've ever used. Maximum contrast, maximum focus. The Geist typeface (custom, now open source) was designed specifically for developer interfaces: geometric sans for UI, geometric mono for code. No accent color — pure black/white/gray hierarchy.La marca de Vercel es la más radical de las seis. Negro puro (#000000) como fondo primario — no navy oscuro, no oscuro cálido, negro puro. Esto es intencional: los developers viven en dark mode, y Vercel quiere ser la plataforma que se siente como la mejor herramienta de developer que han usado. Contraste máximo, foco máximo. La tipografía Geist (custom, ahora open source) fue diseñada específicamente para interfaces de developer: geométrica sans para UI, geométrica mono para código. Sin color de acento — jerarquía pura negro/blanco/gris.

Color System — Pure Grayscale + FunctionalSistema de Color — Escala de Grises Pura + Funcional

#000

#111

#333

#444

#666

#888

#eaeaea

#fafafa

#0070F3

Blue · Links · Info

#50E3C2

Cyan · Success

#FF0080

Pink · Error/Warning

TypographyTipografía

Display/UI: Geist Sans (open source, Google Fonts)

Code/Data: Geist Mono (open source, Google Fonts)

Scale: 12 · 14 · 16 · 20 · 24 · 32 · 48 · 64px

Weight: 400 body · 500 medium · 600 semibold · 700 bold

Radius: 6px standard · 8px cards · 12px modal

ButtonsBotones

What Shopilot InheritsQué Hereda Shopilot

Dark-first approach · Pure functional color (no decoration) · High contrast focus ring · Developer-dense information hierarchy · Geist Mono (open source alternative to JetBrains Mono)Enfoque dark-first · Color puramente funcional (sin decoración) · Focus ring alto contraste · Jerarquía de información densa para developers · Geist Mono (alternativa open source a JetBrains Mono)

SH

Shopify / Polaris Design System

Commerce Platform · Ottawa · 2006 · Public ($SHOP)Plataforma de Comercio · Ottawa · 2006 · Pública ($SHOP)

Commerce SaaS

Brand PhilosophyFilosofía de Marca

"Merchant-first: every decision evaluated from the merchant's perspective"

Polaris is the most mature design system in this study — 7+ years of iteration, thousands of components, and a philosophy that has been consistently proven: clarity beats elegance. Shopify's merchant is not a designer or developer — they're a small business owner who needs to act fast and make money. The design system's entire vocabulary is optimized for task completion speed, not visual delight. The green brand color grew from the Shopify logo and represents growth, money, and success.Polaris es el sistema de diseño más maduro de este estudio — 7+ años de iteración, miles de componentes, y una filosofía consistentemente probada: la claridad supera a la elegancia. El comerciante de Shopify no es diseñador ni developer — es un dueño de pequeño negocio que necesita actuar rápido y ganar dinero. El vocabulario completo del sistema de diseño está optimizado para la velocidad de completar tareas, no para el deleite visual. El color verde de marca creció del logo de Shopify y representa crecimiento, dinero y éxito.

Color SystemSistema de Color

#FAFAFA

Background

Light mode primary

#202223

Ink

Primary text

#008060

Interactive Green

CTAs · brand

#95BF47

Logo Green

Brand logo only

#5C5F62

Subdued

Secondary text

#D82C0D

Critical

Error · destructive

#FFC453

Warning

Alert states

#AEE9D1

Success Light

Success bg tint

TypographyTipografía

All: Inter (UI) · system-ui fallback

Scale: 12 · 14 · 16 · 20 · 26 · 32px

Radius: 4px inputs · 8px cards · 12px modals

Data Viz Rules:Reglas de Data Viz:

Totals bold + row 1 · Focus: 1 insight/chartTotales en negrita + fila 1 · Foco: 1 insight/chart

Multiple data formats (table + chart always)Múltiples formatos de datos (tabla + chart siempre)

ButtonsBotones

What Shopilot InheritsQué Hereda Shopilot

Seller-first decision framework · Data viz rules (totals first, 1 insight) · Semantic color discipline · Clarity > elegance principle · A11y requirements for data tablesFramework de decisiones seller-first · Reglas data viz (totales primero, 1 insight) · Disciplina de color semántico · Principio claridad > elegancia · Requisitos a11y para tablas de datos

YC Startups · 10 Similar Brands10 Marcas Similares
BX

Brex

Corporate Fintech · San Francisco · 2017 · YC W17 · $12.3B valuationFintech Corporativo · San Francisco · 2017 · YC W17 · Valoración $12.3B

Fintech · Handles Real Money

Brand PhilosophyFilosofía de Marca

"Make money management effortless for ambitious companies"

Brex is the closest contextual analogue to Shopilot in terms of trust architecture. Both handle real money on behalf of businesses, both require the UI to communicate precision and authority. Brex's design evolved from a startup-y orange era to a mature, premium dark theme. Current palette: near-black backgrounds (#0E0E0E), warm coral/salmon accent for CTA emphasis, Söhne as the premium custom typeface. The warm coral (not pure orange) signals "approachable financial authority" — slightly warmer than corporate, slightly cooler than consumer fintech.Brex es el análogo contextual más cercano a Shopilot en términos de arquitectura de confianza. Ambos manejan dinero real en nombre de negocios, ambos requieren que la UI comunique precisión y autoridad. El diseño de Brex evolucionó de una era naranja de startup a un tema oscuro premium maduro. Paleta actual: fondos casi-negros (#0E0E0E), acento coral/salmón cálido para énfasis CTA, Söhne como la tipografía premium custom. El coral cálido (no naranja puro) señala "autoridad financiera accesible" — ligeramente más cálido que el corporativo, ligeramente más frío que el fintech consumer.

Color SystemSistema de Color

#0E0E0E

Background Dark

Near-black · product UI

#FFFDF9

Background Light

Warm off-white

#F27B6B

Coral Accent

CTAs · brand emphasis

#FF5200

Hot Orange

High-urgency CTAs

#1A1A1A

Surface

Cards, panels

#2D2D2D

Border/Stroke

Dividers, outlines

#00C278

Success Green

Positive states

#FF4444

Error Red

Errors · blocks

TypographyTipografía

Display: Söhne (Klim Type Foundry) · €€€

UI: Söhne · weights 300/400/600

Data: Söhne Mono (tabular figures)

Scale: 11 · 13 · 15 · 18 · 24 · 36 · 48px

Key: Tabular figures for all financial data (tnum feature)Figuras tabulares para todos los datos financieros (feature tnum)

ButtonsBotones

Key Insights for ShopilotInsights Clave para Shopilot

Trust architecture: Trust-critical data (balances, transactions) gets highest contrast (white-on-black). Secondary info gets progressively less contrast.Arquitectura de confianza: Los datos críticos de confianza (balances, transacciones) obtienen el mayor contraste (blanco sobre negro). La info secundaria obtiene progresivamente menos contraste.

Tabular nums: All financial data uses font-variant-numeric: tabular-nums so numbers align vertically in tables.Nums tabulares: Todos los datos financieros usan font-variant-numeric: tabular-nums para que los números se alineen verticalmente en tablas.

Could inspire Shopilot: Near-black background · Coral warm accent · Tabular nums for prices · Söhne inspiration (use Inter + JetBrains Mono as accessible equivalent)Podría inspirar a Shopilot: Fondo casi-negro · Acento coral cálido · Nums tabulares para precios · Inspiración Söhne (usar Inter + JetBrains Mono como equivalente accesible)

MC

Mercury

Neobank for Startups · San Francisco · 2019 · YC S19 · $1.62B valuationNeobank para Startups · San Francisco · 2019 · YC S19 · Valoración $1.62B

Banking · Handles Real Money

Brand PhilosophyFilosofía de Marca

"Banking that gets out of your way"

Mercury achieved something extremely rare: making banking software look desirable. Their dark-mode-first interface (a radical choice for financial software in 2019) communicated that they understood their customer — tech founders who live in dark terminals. The Mercury Sans custom typeface has a slight humanist influence that prevents the bank UI from feeling cold and bureaucratic. The teal/blue accent is intentionally understated — mercury (the element) is subtle, precise, reflects its environment.Mercury logró algo extremadamente raro: hacer que el software bancario se viera deseable. Su interfaz dark-mode-first (una elección radical para software financiero en 2019) comunicó que entendían a su cliente — fundadores tech que viven en terminales oscuros. La tipografía custom Mercury Sans tiene una ligera influencia humanista que evita que la UI bancaria se sienta fría y burocrática. El acento teal/azul es intencionalmente contenido — el mercurio (el elemento) es sutil, preciso, refleja su entorno.

Color SystemSistema de Color

#0A0A0A

Background

Near-pure black

#FAFAF9

Light BG

Warm off-white

#4AA8FF

Mercury Blue

CTAs · links · selected

#00BFA5

Teal

Balance · positive

#141414

Surface

Cards · panels

#1E1E1E

Hover Surface

Row hover

#FF5F5F

Alert Red

Errors · negative bal.

#F5A623

Warning Amber

Low balance · pending

TypographyTipografía

Display/UI: Mercury Sans (custom, humanist geometric)

Numbers: Tabular lining figures (font-variant-numeric)

Code: Fira Code / iA Writer Mono (code blocks)

Weight: 300 light · 400 regular · 500 medium · 600 semibold

Spacing: letter-spacing: -0.01em for display text

Buttons + UI PatternsBotones + Patrones UI

Radius: 12px (rounded, approachable) · Borders: ultra-subtle rgba · Balance displayed in large mono at top of every pageRadio: 12px (redondeado, accesible) · Bordes: ultra-sutiles rgba · Balance mostrado en mono grande al inicio de cada página

Key Insights for ShopilotInsights Clave para Shopilot

Dark-first banking sets the precedent that serious financial tools CAN be dark mode · Balance/KPI always displayed in large mono (same as Shopilot GMV) · 12px radius makes data dense while remaining approachable · Warm off-white light mode for reports/print contextsEl banking dark-first sienta el precedente de que las herramientas financieras serias PUEDEN ser dark mode · Balance/KPI siempre mostrado en mono grande (igual que GMV de Shopilot) · Radio 12px hace los datos densos mientras permanecen accesibles · Off-white cálido modo claro para reportes/contextos de impresión

RT

Retool

Internal Tools Builder · San Francisco · 2017 · YC S17 · $3.2B valuationConstructor de Herramientas Internas · San Francisco · 2017 · YC S17 · Valoración $3.2B

Data-Dense B2B

Brand PhilosophyFilosofía de Marca

"Build internal tools, 10x faster"

Retool is the master of data-dense UI. Their product is literally a table+form builder — every design decision serves the goal of making dense grids of data scannable and actionable. Their canvas-style editor is perhaps the most data-rich interface in SaaS. Blue accent (#3B5EE7) was chosen for authority and trust — similar to financial platforms but more "engineering-y" than coral/orange. The dark background (#202124) is slightly warm-gray, similar to VS Code, which their developer audience knows instinctively.Retool es el maestro de la UI densa en datos. Su producto es literalmente un constructor de tabla+formulario — cada decisión de diseño sirve al objetivo de hacer que las cuadrículas de datos densas sean escaneables y accionables. Su editor canvas es quizás la interfaz más rica en datos del SaaS. El acento azul (#3B5EE7) fue elegido por autoridad y confianza — similar a las plataformas financieras pero más "ingenieril" que coral/naranja. El fondo oscuro (#202124) es ligeramente gris cálido, similar a VS Code, que su audiencia de developers conoce instintivamente.

Color SystemSistema de Color

#202124

BG Dark

Warm gray (VS Code-ish)

#F8F9FA

BG Light

Default canvas

#3B5EE7

Blue Brand

Selected · CTAs

#5C7CFA

Blue Light

Hover · focus

#2C2D30

Surface

Panel bg

#37383B

Border

Dividers

#2ECC71

Success

OK states

#E74C3C

Error

Error states

Data Table Design (Core Pattern)Diseño de Data Table (Patrón Core)

Row height: 32px compact · 40px default · 48px comfortable (user-configurable)Altura de fila: 32px compacto · 40px default · 48px cómodo (configurable por usuario)

Header: sticky · sortable · resizable columns · filter per columnHeader: sticky · ordenable · columnas redimensionables · filtro por columna

Numbers: Right-aligned in all numeric columns · font-variant-numeric: tabular-numsNúmeros: Alineados a la derecha en todas las columnas numéricas · font-variant-numeric: tabular-nums

Could inspire Shopilot: Compact table density · Column sorting + filtering · Right-aligned numbers · VS Code-familiar warm gray bgPodría inspirar a Shopilot: Densidad de tabla compacta · Ordenación + filtrado de columnas · Números alineados a la derecha · Fondo gris cálido familiar de VS Code

SB

Supabase

Open Source Firebase Alternative · Singapore · 2020 · YC S20 · $200M+ raisedAlternativa Firebase Open Source · Singapur · 2020 · YC S20 · +$200M recaudados

Dev Tools · Open Source

Brand PhilosophyFilosofía de Marca

"Build in a weekend, scale to millions"

Supabase's brand is perhaps the most distinctive in this study: an aggressive, developer-native green (#3ECF8E) on pure dark backgrounds. The green was chosen for its association with databases (terminal text), open source culture (GitHub green), and PostgreSQL. Their brand radiates developer confidence — "we're not trying to be enterprise, we're trying to be the best developer experience." The contrast between near-black backgrounds and the bright emerald is high (7.2:1), making every UI element immediately visible.La marca de Supabase es quizás la más distintiva de este estudio: un verde agresivo y developer-native (#3ECF8E) sobre fondos oscuros puros. El verde fue elegido por su asociación con bases de datos (texto terminal), cultura open source (GitHub verde) y PostgreSQL. Su marca irradia confianza de developer — "no estamos tratando de ser enterprise, estamos tratando de ser la mejor experiencia de developer". El contraste entre fondos casi-negros y el esmeralda brillante es alto (7.2:1), haciendo que cada elemento UI sea inmediatamente visible.

Color SystemSistema de Color

#1C1C1C

BG Dark

Primary background

#111111

BG Deeper

Sidebar / nav

#3ECF8E

Supabase Green

Brand · CTAs · selected

#00C973

Green Vivid

Running / active states

#262626

Surface

Cards

#3F3F3F

Border

Dividers

#F97316

Warning Amber

Attention states

#EF4444

Error Red

Errors · destructive

Typography + Key Insights for ShopilotTipografía + Insights Clave para Shopilot

Display/UI: Inter (all weights) · Code: Fira Code / UI Monospace

Radius: 6px uniform — very slightly rounded, feels professional not playfulRadio: 6px uniforme — muy ligeramente redondeado, se siente profesional no juguetón

Could inspire Shopilot: Proof that a single strong accent color CAN be green for marketplaces (Shopify marketplace tab) · Dark + bright single accent contrast pattern · Warning using orange (candidate reference for Shopilot)Podría inspirar a Shopilot: Prueba de que un solo color de acento fuerte PUEDE ser verde para marketplaces (tab marketplace Shopify) · Patrón de contraste oscuro + acento único brillante · Warning usando naranja (referencia candidata para Shopilot)

PH

PostHog

Open Source Product Analytics · London · 2020 · YC W20 · $225M raisedAnalytics de Producto Open Source · Londres · 2020 · YC W20 · $225M recaudados

Analytics · Open Source

Brand PhilosophyFilosofía de Marca

"The only product analytics platform where data stays yours"

PostHog is the most boldly-branded in this study. Hedgehog mascot, golden yellow (#F9BD2B) that actually glows, developer-irreverent tone. Their design deliberately breaks "enterprise SaaS" conventions to signal: we're built by developers, for developers, and we refuse to look boring. However, beneath the playfulness, the data visualization is meticulously precise. Their dark UI (#1D1D27 with purple-shifted dark) keeps analytics dashboards readable 8+ hours a day. The yellow is used sparingly for the most important elements.PostHog es la marca más audaz de este estudio. Mascota de erizo, amarillo dorado (#F9BD2B) que literalmente brilla, tono irreverente de developer. Su diseño rompe deliberadamente las convenciones de "enterprise SaaS" para señalar: somos construidos por developers, para developers, y nos negamos a vernos aburridos. Sin embargo, debajo del juego, la visualización de datos es meticulosamente precisa. Su UI oscura (#1D1D27 con oscuro desplazado hacia púrpura) mantiene los dashboards de analytics legibles 8+ horas al día. El amarillo se usa con moderación para los elementos más importantes.

Color SystemSistema de Color

#1D1D27

BG Dark

Purple-shifted dark

#FFFEF0

BG Light

Golden cream

#F9BD2B

PostHog Yellow

Brand · emphasis

#F54E00

Hot Orange

CTAs · high-priority

#2C2C3A

Surface

Cards · panels

#3C3C50

Border

Dividers

#2AC940

Success

Positive events

#F04438

Error

Error states

Key Insights for ShopilotInsights Clave para Shopilot

Purple-shifted dark backgrounds feel "deeper" than neutral dark — great for analytics views · Data precision underneath playful branding · Yellow used ONLY for the most important metric on screen (same principle that could apply to Shopilot's chosen accent (TBD)) · Chart color palette: 8 distinct hues, all at 60% saturation for harmonyLos fondos oscuros desplazados hacia púrpura se sienten "más profundos" que el oscuro neutro — excelente para vistas de analytics · Precisión de datos bajo una marca juguetona · Amarillo usado SOLO para la métrica más importante en pantalla (mismo principio que podría aplicar al acento elegido de Shopilot (por definir)) · Paleta de colores de charts: 8 tonos distintos, todos al 60% de saturación para armonía

RS

Resend

Developer Email Platform · San Francisco · 2022 · YC W23 · $26M raisedPlataforma de Email para Developers · San Francisco · 2022 · YC W23 · $26M recaudados

Dev Infrastructure

Brand PhilosophyFilosofía de Marca

"Email for developers, built by developers"

Resend's brand is pure monochromatic minimalism — perhaps the most extreme in this study. Pure black (#000000), pure grays, one orange accent for the logo and primary CTA only. The philosophy: email infrastructure should be completely invisible, the developer's code is the product. Their UI is so stripped down it looks like GitHub's settings page elevated to art. This design communicates: we're not trying to impress you with UI, we're trying to not get in your way. Strong influence from Vercel's aesthetic (same investor: Guillermo Rauch's orbit).La marca de Resend es minimalismo monocromático puro — quizás el más extremo de este estudio. Negro puro (#000000), grises puros, un acento naranja para el logo y el CTA primario únicamente. La filosofía: la infraestructura de email debe ser completamente invisible, el código del developer es el producto. Su UI está tan despojada que parece la página de configuración de GitHub elevada a arte. Este diseño comunica: no estamos tratando de impresionarte con UI, estamos tratando de no interponernos en tu camino. Fuerte influencia de la estética de Vercel (mismo inversor: órbita de Guillermo Rauch).

Color System — Pure MonochromaticSistema de Color — Monocromático Puro

#000

BG

#0a0a

Surface

#171717

Card

#262626

Border

#525252

Muted

#a3a3a3

Secondary

#ededed

Primary

#fff

Headings

#FF5700

Logo Orange · CTA only

TypographyTipografía

All: Geist Sans + Geist Mono (open source)

Scale: 13 · 14 · 16 · 20 · 28 · 40px

Tracking: letter-spacing: -0.02em headings

Radius: 8px standard (slightly rounded)

ButtonsBotones

Key Insights for ShopilotInsights Clave para Shopilot

Proof that monochromatic + one accent works at scale · #000 vs #171717 vs #262626 — subtle layering creates depth without color · Code + logs = always Geist Mono / JetBrains Mono → reinforces precisionPrueba de que monocromático + un acento funciona a escala · #000 vs #171717 vs #262626 — capas sutiles crean profundidad sin color · Código + logs = siempre Geist Mono / JetBrains Mono → refuerza precisión

CL

Clerk

Authentication Platform · San Francisco · 2021 · YC W22 · $170M raisedPlataforma de Autenticación · San Francisco · 2021 · YC W22 · $170M recaudados

Auth · Dev Tools

Brand PhilosophyFilosofía de Marca

"The most comprehensive User Management Platform"

Clerk's brand sits at the intersection of developer tools and security software. Purple (#6C47FF) was chosen to differentiate from both the "enterprise blue" space (Okta, Auth0) and the "startup orange" space. It communicates "modern, premium, slightly magical" — auth happens in the background, Clerk makes it elegant. Their dark UI (#131316 — warm-shifted very dark) uses glass-morphism for the prebuilt UI components, an unusual choice that works because authentication is a "gateway moment" that benefits from premium feel.La marca de Clerk se sitúa en la intersección entre herramientas de developer y software de seguridad. El púrpura (#6C47FF) fue elegido para diferenciarse tanto del espacio "azul enterprise" (Okta, Auth0) como del espacio "naranja startup". Comunica "moderno, premium, ligeramente mágico" — la autenticación ocurre en el fondo, Clerk la hace elegante. Su UI oscura (#131316 — muy oscura con tono cálido) usa glass-morphism para los componentes UI prefabricados, una elección inusual que funciona porque la autenticación es un "momento puerta de entrada" que se beneficia de la sensación premium.

Color SystemSistema de Color

#131316

BG Dark

Warm-shifted dark

#FAFAFA

BG Light

Dashboard light mode

#6C47FF

Clerk Purple

Brand · CTAs · focus

#9B7DFF

Purple Light

Hover · secondary

#1C1C21

Surface

Cards

#2C2C35

Border

Dividers

#12B76A

Success

Auth success

#F04438

Error

Auth failure

Key Insights for ShopilotInsights Clave para Shopilot

Glass-morphism for "gateway moments" (login, confirmation dialogs) · Purple differentiation shows you don't need orange to be distinctive · #131316 warm-dark-shifted background similar to Shopilot's own bg · Onboarding modal design: clean step indicators, focus on one action per stepGlass-morphism para "momentos puerta de entrada" (login, diálogos de confirmación) · Diferenciación púrpura muestra que no necesitas naranja para ser distintivo · Fondo oscuro cálido #131316 similar al fondo propio de Shopilot · Diseño de modal de onboarding: indicadores de paso limpios, foco en una acción por paso

DL

Deel

Global HR & Payroll · San Francisco · 2019 · YC W19 · $12B valuationRRHH y Nómina Global · San Francisco · 2019 · YC W19 · Valoración $12B

Global Payroll · Handles Real Money

Brand PhilosophyFilosofía de Marca

"Hire anyone, anywhere — with compliance built in"

Deel handles international payroll for 35,000+ companies — arguably the most complex, trust-critical SaaS product in this study. Their design reflects that weight: corporate navy blue (#1D2130) backgrounds, conservative button styles, clear error states for compliance failures. Nothing flashy — a company trusting you with their global payroll needs you to look like you know what you're doing. The blue palette (#2B6EE4) is authoritative without being aggressive, similar to how a bank presents itself.Deel maneja la nómina internacional de 35,000+ empresas — posiblemente el producto SaaS más complejo y crítico de confianza de este estudio. Su diseño refleja ese peso: fondos azul marino corporativo (#1D2130), estilos de botón conservadores, estados de error claros para fallas de cumplimiento. Nada llamativo — una empresa que te confía su nómina global necesita que parezcas saber lo que estás haciendo. La paleta azul (#2B6EE4) es autoritaria sin ser agresiva, similar a cómo un banco se presenta.

Color SystemSistema de Color

#1D2130

BG Dark Navy

Primary dark surface

#F4F6FA

BG Light

Blue-tinted white

#2B6EE4

Deel Blue

CTAs · brand

#4D8FF0

Blue Light

Hover · secondary

#252A3C

Surface

Cards

#2F3547

Border

Dividers

#00C48C

Success Teal

Paid · approved

#FF647C

Error Coral

Failed · blocked

Key Insights for ShopilotInsights Clave para Shopilot

Navy-shifted dark bg (#1D2130) creates more "financial authority" feel than neutral dark · Compliance status rows: clear color coding (approved=teal, pending=amber, failed=red) · Dense multi-level table hierarchy (company > employee > payment) — similar to Shopilot's ASIN > marketplace > metric hierarchyFondo oscuro desplazado hacia navy (#1D2130) crea más sensación de "autoridad financiera" que el oscuro neutro · Filas de estado de cumplimiento: codificación de color clara (aprobado=teal, pendiente=amber, fallido=rojo) · Jerarquía de tabla multi-nivel densa (empresa > empleado > pago) — similar a jerarquía ASIN > marketplace > métrica de Shopilot

RP

Replit

Browser-based IDE · San Francisco · 2016 · YC W18 · $1.16B valuationIDE en Navegador · San Francisco · 2016 · YC W18 · Valoración $1.16B

AI-Native Dev Tool

Brand PhilosophyFilosofía de Marca

"Code, create, and learn together"

Replit's brand bridges developer-serious and beginner-accessible. Their orange (#F56C2A) is warmer and more playful than Cursor's (#f54e00) — intentional, as Replit serves both students and professionals. The dark background (#0D1117) is identical to GitHub's dark mode — leveraging existing mental models for developers. Their recent pivot to "Replit AI" accelerated their design maturity: more glass effects, more gradient accents, more AI-native patterns. Strong parallel to Shopilot: both are Electron-like experiences where the IDE/marketplace is the primary canvas and AI assistance is the sidebar.La marca de Replit hace un puente entre serio-developer y accesible-principiante. Su naranja (#F56C2A) es más cálido y juguetón que el de Cursor (#f54e00) — intencional, ya que Replit sirve tanto a estudiantes como a profesionales. El fondo oscuro (#0D1117) es idéntico al modo oscuro de GitHub — aprovechando modelos mentales existentes de developers. Su pivot reciente a "Replit AI" aceleró su madurez de diseño: más efectos de vidrio, más acentos degradados, más patrones AI-native. Fuerte paralelismo con Shopilot: ambas son experiencias tipo Electron donde el IDE/marketplace es el canvas primario y la asistencia AI es la sidebar.

Color SystemSistema de Color

#0D1117

BG Dark

GitHub-identical dark

#F6F8FA

BG Light

GitHub-identical light

#F56C2A

Replit Orange

Brand · CTAs

#FF7B54

Orange Light

Hover state

#161B22

Surface

Cards · panels

#21262D

Border

Dividers

#3FB950

Success

Build success

#F85149

Error

Build error

Key Insights for ShopilotInsights Clave para Shopilot

Split IDE+AI sidebar = exact Shopilot architecture · GitHub-familiar dark (#0D1117) leverages existing developer trust · Orange on very dark bg creates high contrast CTA that developers actually click · AI sidebar streaming pattern identical to Shopilot's coaching sidebarSplit IDE+AI sidebar = arquitectura exacta de Shopilot · Oscuro familiar de GitHub (#0D1117) aprovecha confianza existente de developers · Naranja sobre fondo muy oscuro crea CTA de alto contraste que developers realmente hacen click · Patrón de streaming de sidebar AI idéntico a la sidebar de coaching de Shopilot

LU

Luma

Event Platform · San Francisco · 2020 · YC W21 · $150M raisedPlataforma de Eventos · San Francisco · 2020 · YC W21 · $150M recaudados

Community Platform

Brand PhilosophyFilosofía de Marca

"Beautiful event pages that convert"

Luma is the most aesthetically-ambitious brand in this study. Where other products in this list use minimalism as a constraint, Luma uses it as a canvas. Their gradient-based identity (iridescent teal-purple-magenta) feels luxurious without being cluttered. Dark background (#09090B — the darkest in this study, almost absolute black) makes the gradients pop like neon lights in a dark room. Included here because Luma shows what happens when you invest in aesthetic excess as a differentiator — events need to feel exciting, and Luma's brand creates that emotional response. Relevant to Shopilot's onboarding and marketing pages.Luma es la marca más ambiciosa estéticamente de este estudio. Donde otros productos de esta lista usan el minimalismo como restricción, Luma lo usa como lienzo. Su identidad basada en degradados (teal-púrpura-magenta iridiscente) se siente lujosa sin estar saturada. El fondo oscuro (#09090B — el más oscuro de este estudio, casi negro absoluto) hace que los degradados resalten como luces de neón en una habitación oscura. Incluida aquí porque Luma muestra lo que sucede cuando inviertes en exceso estético como diferenciador — los eventos necesitan sentirse emocionantes, y la marca de Luma crea esa respuesta emocional. Relevante para las páginas de onboarding y marketing de Shopilot.

Color SystemSistema de Color

#09090B

BG Absolute

Near-perfect black

#FAFAFA

BG Light

Clean off-white

gradient

Brand Iridescent

Teal→Purple→Pink

#A855F7

Primary Purple

CTAs on dark bg

#141416

Surface

Cards

#1C1C1F

Surface 2

Nested cards

#4FACFE

Teal Blue

Info · links

#EC4899

Pink Accent

Featured · special

Key Insights for ShopilotInsights Clave para Shopilot

Gradient accents for marketing pages only (NOT product UI) — this is the lesson · #09090B absolute black → glass cards on top create incredible depth with zero shadows · Premium "entrance" moments deserve gradient treatment (Shopilot: first-login, marketplace activation) · Inter Display with tight letter-spacing (-0.04em) = expensive look at zero costAcentos degradados solo para páginas de marketing (NO UI de producto) — esta es la lección · Negro absoluto #09090B → tarjetas de vidrio encima crean profundidad increíble con cero sombras · Los momentos "entrada" premium merecen tratamiento degradado (Shopilot: primer login, activación marketplace) · Inter Display con espaciado de letras ajustado (-0.04em) = apariencia costosa a costo cero

🚧

Brand Identity: NOT DEFINED YETIdentidad de Marca: AÚN NO DEFINIDA

This section is a decision framework — a structured guide to the brand choices that must be made before any design system can be built. Nothing here is decided. The references above are inspiration material only.Esta sección es un framework de decisiones — una guía estructurada de las decisiones de marca que deben tomarse antes de construir cualquier design system. Nada aquí está decidido. Las referencias anteriores son solo material de inspiración.

Brand Decision Log — StatusRegistro de Decisiones de Marca — Estado

# DecisionDecisión OptionsOpciones StatusEstado
01Brand philosophy / taglineFilosofía de marca / tagline3 candidates below3 candidatos abajoPENDING
02Primary colorColor primario4 palette candidates below4 paletas candidatas abajoPENDING
03Typography stackStack tipográfico3 pairings below3 combinaciones abajoPENDING
04Logo directionDirección del logoWordmark / Icon+Text / Abstract markWordmark / Icono+Texto / Marca abstractaPENDING
05Dark vs Light vs BothOscuro vs Claro vs AmbosLean: dark-first · Risk: alienates someRecomendación: dark-first · Riesgo: aliena a algunosPENDING
06Brand voice / personalityVoz / personalidad de marcaExpert coach / Trusted advisor / Efficient toolCoach experto / Asesor de confianza / Herramienta eficientePENDING

Decision 01 · Brand PhilosophyDecisión 01 · Filosofía de Marca

CHOOSE ONE

Based on the 16 brands studied, three directions emerged as viable for Shopilot. Each implies a different visual language, color family, and interaction tone.De las 16 marcas estudiadas, surgieron tres direcciones viables para Shopilot. Cada una implica un lenguaje visual, familia de colores e interacción diferente.

A · "Warm Precision"

Warm neutral backgrounds, orange/amber accent, trust through clarity. References: Linear + HubSpot. Best for: sellers who want a tool that feels like a trusted advisor, not a cold dashboard.Fondos neutrales cálidos, acento naranja/ámbar, confianza a través de la claridad. Referencias: Linear + HubSpot. Mejor para: sellers que quieren una herramienta que se siente como asesor de confianza, no un dashboard frío.

B · "Data Intelligence"

Pure dark, electric blue accent, Bloomberg-inspired density. References: Datadog + Bloomberg. Best for: power sellers who see the product as a professional data terminal, prioritizing information density over warmth.Oscuro puro, acento azul eléctrico, densidad estilo Bloomberg. Referencias: Datadog + Bloomberg. Mejor para: sellers avanzados que ven el producto como terminal de datos profesional, priorizando densidad sobre calidez.

C · "Growth Engine"

Dark with green/teal accent, optimistic tone. References: Shopify + Notion. Best for: growth-focused sellers who associate green with profit and want the tool to feel empowering and action-oriented.Oscuro con acento verde/teal, tono optimista. Referencias: Shopify + Notion. Mejor para: sellers orientados al crecimiento que asocian el verde con ganancia y quieren una herramienta empoderada.

Recomendación del estudio: Direction A ("Warm Precision") differentiates most from Helium 10 (purple/2018), Jungle Scout (green/consumer), and Repricer (corporate blue). It positions Shopilot as the only warm, AI-native seller tool. However — this is a recommendation, not a decision.La dirección A ("Warm Precision") diferencia más de Helium 10 (morado/2018), Jungle Scout (verde/consumidor) y Repricer (azul corporativo). Posiciona a Shopilot como la única herramienta de vendedor cálida y AI-native. Sin embargo — esto es una recomendación, no una decisión.

Decision 02 · Primary Color PaletteDecisión 02 · Paleta de Color Principal

CHOOSE ONE

These 4 candidates were derived from the competitive analysis. Each avoids direct collision with existing tools in the market.Estos 4 candidatos se derivaron del análisis competitivo. Cada uno evita colisión directa con herramientas existentes en el mercado.

Orange — #F97316

Energy + action. Competitive differentiation from purple (Helium 10), blue (Repricer), green (Jungle Scout). HubSpot owns "CRM orange" — risk: some overlap perception.Energía + acción. Diferenciación de morado (Helium 10), azul (Repricer), verde (Jungle Scout). HubSpot posee "CRM naranja" — riesgo: percepción de overlap.

Indigo — #6366F1

Intelligence + trust. Used by Linear. Risk: perceived as too similar to Helium 10's purple. Benefit: associates with AI/tech precision.Inteligencia + confianza. Usado por Linear. Riesgo: percibido demasiado similar al morado de Helium 10. Beneficio: asocia con precisión AI/tech.

Sky Blue — #0EA5E9

Clarity + openness. Clean differentiation. Risk: overly generic in SaaS. Benefit: universally accessible, no color blindness issues.Claridad + apertura. Diferenciación limpia. Riesgo: demasiado genérico en SaaS. Beneficio: universalmente accesible, sin problemas de daltonismo.

Violet — #8B5CF6

Premium + AI. High association with AI products (Claude, Perplexity). Risk: Helium 10 has purple brand equity. Benefit: strong AI-native signal to tech-savvy sellers.Premium + AI. Alta asociación con productos AI (Claude, Perplexity). Riesgo: Helium 10 tiene equity de marca morada. Beneficio: señal AI-native fuerte para sellers tech-savvy.

What the study recommends:Lo que el estudio recomienda: Orange (#F97316) for maximum warm contrast. But this requires a final call from the team — specifically: does Shopilot want to feel more like a financial tool (blue/indigo) or more like an action-oriented coach (orange)?Naranja (#F97316) para máximo contraste cálido. Pero esto requiere una decisión final del equipo — específicamente: ¿quiere Shopilot sentirse más como herramienta financiera (azul/índigo) o más como coach orientado a la acción (naranja)?

Decision 03 · Typography StackDecisión 03 · Stack Tipográfico

CHOOSE ONE
OptionOpción Display / UIDisplay / UI Numbers / CodeNúmeros / Código ReferenceReferencia
AInterJetBrains MonoLinear, Vercel — neutral, modern, safe
BGeist / DM SansJetBrains MonoVercel, Framer — slightly more personality
CIBM Plex SansIBM Plex MonoIBM, Datadog — technical authority, B2B trust

All 3 options are free, widely available, and render well in Electron. The mono font for numbers is non-negotiable across all options — see Section 14 design rationale for why.Las 3 opciones son gratuitas, ampliamente disponibles y renderizan bien en Electron. La fuente mono para números es innegociable en todas las opciones — ver la sección 14 para el fundamento del diseño.

Decision 04 · Logo DirectionDecisión 04 · Dirección de Logo

CHOOSE ONE
[ wordmark ]

Wordmark Only

Just the "Shopilot" name in custom lettering. Simple, flexible. Risk: hard to use at small sizes (tray icon, favicon).Solo el nombre "Shopilot" en lettering personalizado. Simple, flexible. Riesgo: difícil a tamaños pequeños.

S
shopilot

Icon + Wordmark

Symbol that works standalone (tray, favicon, app icon) + name for contexts with space. Most flexible system.Símbolo que funciona solo (tray, favicon, ícono de app) + nombre para contextos con espacio. Sistema más flexible.

[ abstract ]

Abstract Mark

Unique geometric shape with no letterform. High memorability ceiling. Risk: requires brand awareness to work — too early for a v1 product.Forma geométrica única sin letterform. Alto techo de memorabilidad. Riesgo: requiere conocimiento de marca — demasiado pronto para v1.

Recommended for v1:Recomendado para v1: Option B (Icon + Wordmark). Allows a small icon in the macOS tray, a medium icon in the dock, and full wordmark in the sidebar. But the icon design itself is a separate creative decision — do not ship a placeholder.Opción B (Ícono + Wordmark). Permite un ícono pequeño en el tray de macOS, ícono mediano en el dock, y wordmark completo en el sidebar. Pero el diseño del ícono en sí es una decisión creativa separada — no hacer ship con un placeholder.

Decision 05 · Dark vs Light ModeDecisión 05 · Modo Oscuro vs Claro

CHOOSE ONE

Dark-first (recommended by study)Dark-first (recomendado por el estudio)

Cursor, Linear, Arc, Datadog, Claude — all dark-first. Reduces eye strain in long sessions. Numbers pop on dark backgrounds. All reference brands studied use dark mode as the primary experience. Competitive differentiation from Helium 10 (light default).Cursor, Linear, Arc, Datadog, Claude — todos dark-first. Reduce fatiga visual en sesiones largas. Los números destacan sobre fondos oscuros. Diferenciación de Helium 10 (claro por defecto).

Risk of dark-onlyRiesgo de solo oscuro

Some sellers work in bright environments (warehouses, offices). If Shopilot is dark-only, it may feel hard to read in those contexts. A light mode in Phase 2 is strongly advisable. V1: dark only to reduce scope.Algunos sellers trabajan en ambientes brillantes (almacenes, oficinas). Si Shopilot es solo oscuro, puede ser difícil de leer en esos contextos. Un modo claro en Fase 2 es muy recomendable. V1: solo oscuro para reducir el alcance.

→ How to use this section→ Cómo usar esta sección

  1. Review the 16 reference brand books above — understand what each brand does and why.Revisar los 16 brand books de referencia arriba — entender qué hace cada marca y por qué.
  2. Make a decision on each of the 6 items in the tracker at the top of this section. Pablo + Mateo + Sergio should be in the room.Tomar una decisión en cada uno de los 6 ítems del tracker al inicio de esta sección. Pablo + Mateo + Sergio deben estar presentes.
  3. Document the chosen direction back into this spec — replace "PENDING" with the decided value and the rationale.Documentar la dirección elegida de vuelta en este spec — reemplazar "PENDING" con el valor decidido y el razonamiento.
  4. Only then build design tokens (§14 · Stack) — the CSS custom properties, the Tailwind config, the Style Dictionary pipeline. Building tokens before the brand decisions are made is wasted work.Solo entonces construir los design tokens (§14 · Stack) — las propiedades CSS, el config de Tailwind, el pipeline de Style Dictionary. Construir tokens antes de decidir la marca es trabajo desperdiciado.
  5. Commission a designer for the logo once the color and philosophy direction are locked. Do not use AI-generated or placeholder marks in any public-facing context.Contratar a un diseñador para el logo una vez que la dirección de color y filosofía esté definida. No usar marcas generadas por AI ni placeholders en ningún contexto público.
§14 · SÍNTESIS

Study Synthesis — Patterns Found Across All 16 Brands Síntesis del Estudio — Patrones Encontrados en las 16 Marcas

After analyzing 16 world-class products (Anthropic, Cursor, Linear, Arc, Figma, Stripe, Vercel, HubSpot, Shopify, Datadog, Bloomberg, Notion, Intercom, Brex, Mercury, Luma), 7 universal patterns emerged that every top-tier product shares — regardless of industry, color, or audience. These are conclusions, not recommendations for Shopilot. Tras analizar 16 productos de clase mundial, emergieron 7 patrones universales que comparten todos los productos de primer nivel — independientemente de industria, color o audiencia. Estas son conclusiones del estudio, no recomendaciones para Shopilot.

01

One strong primary color = brand ownership Un color primario fuerte = propiedad de categoría

Every studied brand owns exactly ONE color. Not two, not a gradient system as their identity — one color that is unmistakably theirs. This color appears on buttons, on the favicon, on the loading state, on the cursor. It becomes the brand. Cada marca estudiada posee exactamente UN color. No dos, no un sistema de gradientes como identidad — un color que es inconfundiblemente suyo. Aparece en botones, favicon, estado de carga y cursor. Se convierte en la marca.

Anthropic #CC785C copper

Linear #5e6ad2 indigo

HubSpot #FF7A59 orange

Shopify #96BF48 green

What the study shows:Lo que el estudio muestra: Color category ownership is first-come-first-served. Purple → Figma/Anthropic. Green → Shopify/Notion. Blue → almost every generic SaaS. Orange → HubSpot. The strongest move for a new brand is to claim a color that no dominant competitor owns in its specific category.La propiedad de color por categoría es "el primero en llegar se sirve primero". Morado → Figma/Anthropic. Verde → Shopify/Notion. Azul → casi todo SaaS genérico. Naranja → HubSpot. El movimiento más fuerte para una nueva marca es reclamar un color que ningún competidor dominante posea en su categoría específica.

02

Power tools are dark-first — light mode is an afterthought Las herramientas de poder son dark-first — el modo claro es secundario

Of the 16 brands studied: Cursor, Linear, Arc, Datadog, Bloomberg, Claude, Vercel, Brex, Mercury, Retool, PostHog — all ship dark as primary. Light mode exists but is not the designed-for experience. The pattern holds across every product category where users are professionals staring at screens for 6+ hours. De las 16 marcas estudiadas: Cursor, Linear, Arc, Datadog, Bloomberg, Claude, Vercel, Brex, Mercury, Retool, PostHog — todas hacen dark como primario. El modo claro existe pero no es la experiencia diseñada. El patrón se mantiene en toda categoría donde los usuarios son profesionales mirando pantallas por 6+ horas.

ProductProducto Primary modeModo primario BackgroundFondo
CursorDark#1B1B1F — near-black, warm
LinearDark#0F0F11 — pure dark
Claude / AnthropicDark#1A1A2E — violet-shifted dark
Arc BrowserDark#1C1C1E — macOS standard dark
DatadogDark#14131A — purple-shifted
VercelDark#000000 — pure black
HubSpot / StripeLight#FFFFFF — pure white

Study finding:Hallazgo del estudio: The dark backgrounds that work best are NOT pure black (#000). They are near-blacks with a hue shift — warm (#1B1B1F), cool (#0F0F11), violet (#14131A), or macOS system (#1C1C1E). Pure black creates harshness; hue-shifted dark creates depth. Also: the darker the background, the more the accent color pops — which is why dark-first products can use a single, lower-saturation accent and still feel branded.Los fondos oscuros que mejor funcionan NO son negro puro (#000). Son near-blacks con un cambio de tono — cálido (#1B1B1F), frío (#0F0F11), violeta (#14131A) o sistema macOS (#1C1C1E). El negro puro crea dureza; el oscuro con tono crea profundidad.

03

Typography: 2 fonts maximum — one sans, one mono Tipografía: máximo 2 fuentes — una sans, una mono

Every studied product uses a sans-serif for UI text and a monospace font for all data, code, and numbers. No exceptions. The monospace font for numbers is not a stylistic choice — it is functional: proportional fonts create unstable number columns. Monospace makes data scannable. Cada producto estudiado usa una sans-serif para texto de UI y una mono para datos, código y números. Sin excepciones. La fuente mono para números no es elección estilística — es funcional: las fuentes proporcionales crean columnas de números inestables. La mono hace los datos escaneables.

Sans-serif findingsHallazgos sans-serif

  • Inter — Linear, Vercel, Notion, PostHog
  • Geist — Vercel (custom, based on Inter)
  • SF Pro — Arc, Cursor (system default)
  • Söhne / Graphik — Anthropic, Figma
  • IBM Plex Sans — Datadog, IBM products

Finding: Inter dominates because it's free, variable weight, and optimized for screens. The system font (SF Pro on Mac) is the "invisible" choice that native apps use for maximum rendering quality.Hallazgo: Inter domina por ser gratuita, variable y optimizada para pantallas.

Mono findingsHallazgos mono

  • JetBrains Mono — Cursor, Linear, Vercel
  • Fira Code — developer tools generally
  • SF Mono — Arc, macOS native
  • IBM Plex Mono — Datadog, Brex
  • Geist Mono — Vercel (v2)

Finding: JetBrains Mono is the modern standard for developer-adjacent tools. Its ligatures are readable at 10–12px which is where data tables live.Hallazgo: JetBrains Mono es el estándar moderno para herramientas para desarrolladores. Sus ligaduras son legibles a 10-12px.

04

Motion is functional, not decorative — and it's invisible when done right El movimiento es funcional, no decorativo — es invisible cuando está bien hecho

None of the studied products use animation for visual delight. Every transition serves a purpose: orientation (this panel came from the right), state change (this button is now loading), hierarchy (this modal is above the content). The rule: if you can remove the animation and the user still understands what happened, the animation was decorative. Remove it. Ninguno de los productos estudiados usa animación para deleite visual. Cada transición sirve un propósito: orientación, cambio de estado, jerarquía. Regla: si puedes quitar la animación y el usuario aún entiende qué pasó, la animación era decorativa. Elimínala.

AnimationAnimación DurationDuración PurposePropósito Seen inVisto en
Hover bg change100–150msAcknowledge interactionAll products
Button press scale80ms ease-outPhysical click feedbackLinear, Arc, Luma
Modal slide-up200–250ms springLayer hierarchyFigma, Notion, Linear
Streaming text fade80ms per wordShow AI is generatingClaude, Cursor
Thinking pulse ···1.2s infiniteAI is processingClaude, Cursor, Copilot
Sidebar collapse200ms ease-in-outPreserve spatial orientationLinear, Arc, Notion
05

AI products share a specific visual language for trust and transparency Los productos AI comparten un lenguaje visual específico de confianza y transparencia

The study of Anthropic, Cursor, and Claude Code revealed a distinct pattern absent in non-AI products: every AI action is visually accountable. You always see what the AI is doing, what tool it used, how long it took. There are no black boxes in the UI of the best AI products. El estudio de Anthropic, Cursor y Claude Code reveló un patrón distinto ausente en productos no-AI: cada acción de la IA es visualmente accountable. Siempre ves qué está haciendo, qué herramienta usó, cuánto tardó. No hay cajas negras en la UI de los mejores productos AI.

AI-native patterns (present in all studied AI products)Patrones AI-native (presentes en todos los AI estudiados)

  • Streaming first: never show a spinner while generating textnunca mostrar spinner mientras se genera texto
  • Tool transparency: show every tool call with name + duration + resultmostrar cada tool call con nombre + duración + resultado
  • Reversibility signals: visually distinguish reversible from irreversible actions before confirmationdistinguir visualmente reversible de irreversible antes de confirmar
  • Context visibility: always show what the AI knows (context window, memory, recent files)siempre mostrar qué sabe la IA (ventana de contexto, memoria, archivos recientes)
  • Interrupt capability: stop button always visible during AI generationbotón de stop siempre visible durante generación

Anti-patterns (absent in top AI products)Anti-patrones (ausentes en top AI products)

  • Skeleton loaders for AI output — creates false expectation of content structureSkeleton loaders para output AI — crea expectativa falsa de estructura
  • Generic spinners while thinking — no information, builds anxietySpinners genéricos mientras piensa — sin información, genera ansiedad
  • Hiding tool execution — users don't know what changed in their systemsOcultar ejecución de herramientas — usuarios no saben qué cambió
  • One-shot confirmation dialogs — no diff, no preview, just "Are you sure?"Confirmaciones de un solo paso — sin diff, sin preview, solo "¿Estás seguro?"
06

Information density is a product decision, not a design afterthought La densidad de información es una decisión de producto, no un afterthought de diseño

The studied products cluster into two density philosophies — and both work, but for different users. The choice of density must be made at the product level before any design work begins, because it determines spacing tokens, component heights, font sizes, and the entire information architecture. Los productos estudiados se agrupan en dos filosofías de densidad — ambas funcionan, pero para usuarios distintos. La elección de densidad debe hacerse a nivel de producto antes de cualquier trabajo de diseño, porque determina tokens de espaciado, alturas de componentes, tamaños de fuente y toda la arquitectura de información.

High density — expert toolsAlta densidad — herramientas expertas

Bloomberg, Datadog, Retool, Brex. Row height ≈ 32px. Font size: 11–12px. Assume users know what they're looking at. More information per screen = fewer clicks. Used by professionals who stare at it for hours.Bloomberg, Datadog, Retool, Brex. Altura de fila ≈ 32px. Tamaño de fuente: 11-12px. Los usuarios saben lo que están mirando. Más información por pantalla = menos clics.

Comfortable density — balanced toolsDensidad confortable — herramientas balanceadas

Linear, Notion, Intercom, Luma. Row height ≈ 44px. Font size: 13–14px. Sufficient whitespace to feel premium without hiding data. Works for both new and expert users.Linear, Notion, Intercom, Luma. Altura de fila ≈ 44px. Tamaño de fuente: 13-14px. Suficiente espacio en blanco para sentirse premium sin ocultar datos.

07

Brand = how you speak, not just how you look La marca es cómo hablas, no solo cómo te ves

The strongest brands in the study have a distinct voice in every single word of their UI — button labels, error messages, onboarding copy, empty states, confirmation dialogs. The voice is as distinctive as the color. Stripe writes error messages like a knowledgeable friend. Linear writes UI copy with extreme brevity. Anthropic writes with careful epistemic humility ("I think", "Based on what I know"). Las marcas más fuertes del estudio tienen una voz distintiva en cada palabra de su UI — etiquetas de botones, mensajes de error, copy de onboarding, estados vacíos, diálogos de confirmación. La voz es tan distintiva como el color.

Stripe

Error: "Your card was declined. This sometimes happens if the issuing bank suspects fraud. Try a different card or contact your bank."Error: "Tu tarjeta fue rechazada. A veces ocurre si el banco sospecha fraude. Intenta con otra tarjeta."

Linear

Error: "Failed to sync." ← That's it. No explanation. They trust users to understand context. Extreme brevity as brand.Error: "No se pudo sincronizar." ← Eso es todo. Sin explicación. Brevedad extrema como marca.

Anthropic / Claude

Response: "I'm not certain, but based on what I know..." — epistemic humility baked into every sentence.Respuesta: "No estoy seguro, pero basándome en lo que sé..." — humildad epistémica en cada frase.

Summary — What all world-class products shareResumen — Lo que comparten todos los productos de clase mundial

DimensionDimensión Universal patternPatrón universal Applies to Shopilot?¿Aplica a Shopilot?
Color1 primary accent, 2 functional (success/error), neutral scaleYes — must decide
BackgroundNear-black with hue shift (not #000 or #111)Yes — must decide hue
Typography1 sans for UI + 1 mono for all numbers/dataYes — must choose pair
Motion100–250ms, purposeful only, spring easingYes — adopt directly
AI statesStreaming text, thinking pulse, tool transparencyYes — core requirement
DensityChoose high or comfortable — don't mixYes — must decide
VoiceEvery word of UI reflects brand personalityYes — must define
LogoWorks at 16px (favicon/tray) AND at 200pxYes — must commission
§14 · NECESIDADES

What Shopilot Needs — Design Requirements Analysis Lo que Shopilot Necesita — Análisis de Requerimientos de Diseño

Based on the study synthesis and Shopilot's product definition (AI-native Electron desktop app for e-commerce sellers, 70/30 split, 36 tools, marketplace integration), here is every design element the product needs — independent of brand decisions. These are requirements, not solutions. Basado en la síntesis del estudio y la definición del producto Shopilot (app Electron desktop AI-native para sellers de e-commerce, split 70/30, 36 herramientas, integración de marketplace), aquí están todos los elementos de diseño que el producto necesita — independientemente de las decisiones de marca. Estos son requerimientos, no soluciones.

MASTER CHECKLIST

The 15 things Shopilot must complete to have a world-class designLas 15 cosas que Shopilot debe completar para tener un diseño de clase mundial

Single source of truth. Everything in one place. The detailed breakdown is in the categories below — this is the executive view.Fuente única de verdad. Todo en un lugar. El desglose detallado está en las categorías debajo — esta es la vista ejecutiva.

Phase 1 — Brand IdentityFase 1 — Identidad de Marca (before writing a single line of UI code)

# TaskTarea OutputOutput OwnerOwner StatusEstado
01 Run brand workshop — choose Brand Philosophy (what emotion does Shopilot own?)Realizar brand workshop — elegir Filosofía de Marca (¿qué emoción posee Shopilot?) 1-sentence brand positionPosición de marca en 1 oración Pablo PENDING
02 Decide primary brand color — pick from candidates (see §Brand Decision Framework)Decidir color primario de marca — elegir de candidatos (ver §Brand Decision Framework) 1 hex value, named, documented1 valor hex, nombrado, documentado Pablo + team PENDING
03 Choose typography pair — UI sans + data mono (see §24 References for options)Elegir par tipográfico — UI sans + data mono (ver §24 Referencias para opciones) 2 font names, weight scale defined2 nombres de fuentes, escala de pesos definida Pablo + Sergio PENDING
04 Build the color system — dark bg scale (4 tones) + text scale (4 levels) + semantic colorsConstruir el sistema de color — escala dark bg (4 tonos) + escala de texto (4 niveles) + colores semánticos design-tokens.json — color sectiondesign-tokens.json — sección de color Sergio BLOCKED by 02
05 Commission logo — wordmark + icon mark, works at 16px and 512pxEncargar logo — wordmark + icon mark, funciona a 16px y 512px SVG files: logo.svg, icon.svg, favicon.svgArchivos SVG: logo.svg, icon.svg, favicon.svg Pablo (hire) BLOCKED by 01+02

Phase 2 — UI FoundationFase 2 — Fundación UI (tokens → CSS vars → Tailwind config, semanas 1–2)

# TaskTarea OutputOutput OwnerOwner StatusEstado
06 Complete design-tokens.json — spacing (--g / --v system), radii, shadows, durationCompletar design-tokens.json — espaciado (sistema --g / --v), radios, sombras, duración tokens.json W3C DTCG format Sergio + Mateo BLOCKED by 04
07 Run Style Dictionary pipeline — tokens.json → CSS :root vars + tailwind.config.jsEjecutar pipeline Style Dictionary — tokens.json → CSS :root vars + tailwind.config.js tokens.css, tailwind.config.js Mateo BLOCKED by 06
08 Build Electron window shell — frameless + drag region + macOS traffic lights + 70/30 splitConstruir shell de ventana Electron — frameless + drag region + botones macOS + split 70/30 Running Electron with correct window chromeElectron corriendo con chrome de ventana correcto Sergio PENDING
09 Implement base atoms — Button (6 variants), Badge, Input, Spinner, Tooltip, DividerImplementar átomos base — Button (6 variantes), Badge, Input, Spinner, Tooltip, Divider 6 React components using tokens6 componentes React usando tokens Sergio BLOCKED by 07

Phase 3 — Core ComponentsFase 3 — Componentes Core (semanas 2–6)

# TaskTarea OutputOutput OwnerOwner StatusEstado
10 Build Coach screen — streaming text cursor ▊ + thinking pulse ··· + tool accordion (4 states) + chat inputConstruir pantalla Coach — cursor de texto streaming ▊ + pulso thinking ··· + tool accordion (4 estados) + input de chat Functional coach view with AI state machineVista coach funcional con máquina de estados AI Sergio BLOCKED by 09
11 Build Confirmation Dialog — reversible (amber) vs irreversible (red) variants + diff displayConstruir Confirmation Dialog — variantes reversible (amber) vs irreversible (rojo) + diff display ConfirmationDialog.tsx 2 variants Sergio BLOCKED by 09
12 Build KPI card + data table (sortable) + delta badges — the 80% of the Dashboard screenConstruir KPI card + data table (sortable) + delta badges — el 80% de la pantalla Dashboard Dashboard screen with real dataPantalla Dashboard con datos reales Sergio + Andrés BLOCKED by 09
13 Build status bar (24px) — agent state dot left + credits + model name rightConstruir status bar (24px) — punto de estado del agente izquierda + créditos + nombre de modelo derecha StatusBar.tsx always visible Sergio BLOCKED by 08
14 Build context bar — active ASIN + marketplace dot + context window progress barConstruir context bar — ASIN activo + punto de marketplace + barra de progreso de context window ContextBar.tsx Sergio BLOCKED by 09
15 Accessibility audit — WCAG AA contrast check on all components, keyboard nav, focus ringsAuditoría de accesibilidad — verificación de contraste WCAG AA en todos los componentes, navegación por teclado, focus rings 0 WCAG AA violations0 violaciones WCAG AA Sergio + Andrés BLOCKED by 09-14

Critical path:Ruta crítica: 01 (brand workshop) unblocks everything. Nothing else can start until the team aligns on what emotion Shopilot owns. That's the only decision that can't be delegated or automated.01 (brand workshop) desbloquea todo. Nada más puede empezar hasta que el equipo se alinee en qué emoción posee Shopilot. Es la única decisión que no puede ser delegada ni automatizada.

Category 1 — Brand Identity Elements (detail)Categoría 1 — Elementos de Identidad de Marca (detalle)

ALL MISSINGTODO FALTANTE
ElementElemento Why neededPor qué se necesita Used whereUsado dónde StatusEstado
Logo mark (icon)Works at 16px — macOS dock, tray, faviconElectron dock icon, tray, browser tabMISSING
Wordmark (logotype)Full name, readable at 120px+App sidebar header, landing page, screenshotsMISSING
Primary brand colorButtons, links, active states, focus ringsEverywhere interactive — 200+ UI elementsPENDING DECISION
Background color scaleBase, surface, card, elevated — 4 dark tonesEvery screen, every componentPENDING DECISION
Foreground color scalePrimary text, secondary, muted, disabled — 4 levelsAll text, labels, placeholdersDERIVES FROM BG
Functional colorsSuccess (green), Warning (amber), Error (red), Info (blue)Alerts, badges, status indicators, audit logSTANDARD — PICK
UI typography (sans)All text except numbersLabels, paragraphs, headings, button textPENDING DECISION
Data typography (mono)All numbers, prices, percentages, codeKPI cards, tables, status bar, audit logPENDING DECISION

Category 2 — UI Components Required by the ProductCategoría 2 — Componentes UI Requeridos por el Producto

These are derived from Shopilot's 36 tools and 4 core screens (Coach view, Dashboard, Settings, Billing). Not a design choice — a product requirement.Se derivan de las 36 herramientas de Shopilot y 4 pantallas principales. No es elección de diseño — es un requerimiento del producto.

Foundation (week 1)Fundación (semana 1)

  • • Design tokens (CSS vars)
  • • Button (6 variants)
  • • Input / Textarea
  • • Badge / Tag
  • • Icon system (Lucide)
  • • Tooltip
  • • Spinner / Loading
  • • Divider

Coach screen (week 2-3)Pantalla Coach (semana 2-3)

  • • Chat message (user/AI)
  • • Streaming text cursor ▊
  • • Thinking pulse ···
  • • Tool accordion (4 states)
  • • Confirmation dialog
  • • Proactive suggestion card
  • • Context bar (ASIN + tokens)
  • • Chat input + send button

Data screens (week 4-6)Pantallas de datos (semana 4-6)

  • • KPI metric card
  • • Data table (sortable)
  • • Buy Box indicator
  • • Price delta bar
  • • BSR sparkline
  • • Audit log timeline
  • • Credit economy bar
  • • Fraud alert banner

Category 3 — Electron Desktop-Specific RequirementsCategoría 3 — Requerimientos Específicos de Desktop Electron

These have no equivalent in web apps. Required because Shopilot ships as a native macOS/Windows app, not a browser tab.No tienen equivalente en apps web. Requeridos porque Shopilot es app nativa macOS/Windows, no una pestaña de browser.

  • Title bar: frameless window with drag region + macOS traffic lightsventana sin marco con región de arrastre + botones macOS
  • Tab bar: marketplace switcher (Amazon / MeLi / Shopify) with colored dotsswitcher de marketplace (Amazon / MeLi / Shopify) con puntos de color
  • Status bar: 24px bottom bar — agent state left, credits + model rightbarra inferior 24px — estado del agente izq, créditos + modelo der
  • Tray icon: 16x16 mono SVG + badge count for alertsSVG mono 16x16 + badge para alertas
  • 70/30 split: marketplace WebView (left) + React sidebar (right) — visual seam between themWebView de marketplace (izq) + sidebar React (der) — costura visual entre ellos
  • Update modal: version info + changelog + progress + restart buttoninfo de versión + changelog + progreso + botón de reinicio
  • Notification system: 3 levels: in-app banner → OS push → tray badge3 niveles: banner in-app → push OS → badge del tray
  • App icon: 1024×1024px for App Store + 512px for macOS dock1024×1024px para App Store + 512px para dock macOS
§14 · WORKFLOW

Workflow 0 → Complete Brand — The Efficient Path Workflow 0 → Marca Completa — El Camino Eficiente

The most efficient process to go from "no brand" to a production-ready design system that rivals Anthropic, Cursor, or Linear. This is the process — not based on opinion, but on how the reference brands actually built their design systems. El proceso más eficiente para ir de "sin marca" a un design system listo para producción que rivalice con Anthropic, Cursor o Linear. Este es el proceso — no basado en opinión, sino en cómo las marcas de referencia construyeron sus design systems.

The 5-Phase ProcessEl Proceso de 5 Fases

1

Brand WorkshopBrand Workshop

1–2 days · Pablo + Mateo + Sergio1-2 días · Pablo + Mateo + Sergio

Make the 6 brand decisions from the Decision Framework above. No design tools needed — just a whiteboard or Notion doc. Output: a 1-page brand brief with every decision locked.Tomar las 6 decisiones de marca del Framework de Decisiones anterior. No se necesitan herramientas de diseño — solo una pizarra o doc de Notion. Output: un brand brief de 1 página con cada decisión bloqueada.

Decisions to lock in this phase:Decisiones a bloquear en esta fase:

Brand philosophy (A / B / C)Filosofía de marca (A / B / C) Primary color (which candidate)Color primario (qué candidato) Typography pair (which option)Par tipográfico (qué opción) Logo direction (wordmark / icon+text)Dirección logo (wordmark / icono+texto) Dark mode first: yes/noDark mode primero: sí/no Brand voice archetypeArquetipo de voz de marca
2

Visual Identity in FigmaIdentidad Visual en Figma

3–5 days · Designer (contract) + Pablo review3-5 días · Diseñador (contrato) + revisión Pablo

This is where Figma enters — but only for visual identity exploration, not for UI design. The goal is to validate color, logo, and typography before writing a single line of code. Figma is used here because visual decision-making is faster with a canvas tool than in code.Aquí es donde entra Figma — pero solo para exploración de identidad visual, no para diseño de UI. El objetivo es validar color, logo y tipografía antes de escribir una sola línea de código. Figma se usa aquí porque la toma de decisiones visuales es más rápida con una herramienta canvas.

What goes into Figma in Phase 2:Qué va a Figma en la Fase 2:

  • Logo mark explorations (6–10 directions)Exploraciones del logo (6-10 direcciones)
  • Color palette validation (light + dark test)Validación de paleta (test claro + oscuro)
  • Typography specimens (all weights + sizes)Especímenes tipográficos (todos los pesos + tamaños)
  • 3 brand application mockups (app icon, sidebar header, marketing screenshot)3 mockups de aplicación de marca

What does NOT go into Figma in Phase 2:Qué NO va a Figma en la Fase 2:

  • Full UI screens — premature without tokensPantallas completas de UI — prematuro sin tokens
  • Component library — built in code, not FigmaLibrería de componentes — se construye en código
  • User flows — too earlyUser flows — demasiado pronto

Tools for Phase 2:Herramientas para la Fase 2: Figma (free tier is enough) · fontpair.co for typography pairing · Coolors.co or Realtime Colors for palette generation · Adobe Color for accessibility check · Contrast.app for WCAG validationFigma (tier gratuito es suficiente) · fontpair.co para combinación tipográfica · Coolors.co o Realtime Colors para generación de paleta · Adobe Color para verificación de accesibilidad

3

Design Tokens → CodeDesign Tokens → Código

2 days · Sergio + Mateo2 días · Sergio + Mateo

Once brand decisions are locked from Phase 2, translate them into code immediately. This is where Figma connects to Claude Code: take the approved color values and typography from Figma, encode them as design tokens, and generate the CSS + Tailwind config. Claude Code accelerates this from 2 days to 4 hours.Una vez bloqueadas las decisiones de marca de la Fase 2, traducirlas a código inmediatamente. Aquí es donde Figma se conecta con Claude Code: tomar los valores de color y tipografía aprobados de Figma, codificarlos como design tokens, y generar el CSS + Tailwind config.

Figma → Claude Code integration flow:Flujo de integración Figma → Claude Code:

  1. Export approved brand values from Figma as JSON (Figma Variables → JSON via plugin "Variables Import Export")Exportar valores de marca aprobados desde Figma como JSON (Figma Variables → JSON via plugin)
  2. Paste JSON into Claude Code: "Convert these brand values to a W3C DTCG tokens.json file"Pegar JSON en Claude Code: "Convierte estos valores de marca a un archivo tokens.json DTCG W3C"
  3. Claude Code generates: tokens.json + style-dictionary.config.mjs + globals.css + tailwind.config.tsClaude Code genera: tokens.json + style-dictionary.config.mjs + globals.css + tailwind.config.ts
  4. Run Style Dictionary → CSS custom properties are live in the appEjecutar Style Dictionary → propiedades CSS custom están vivas en la app
  5. Validate: open Electron app, confirm colors match Figma specValidar: abrir app Electron, confirmar que los colores coinciden con el spec de Figma
4

Component Library with Claude CodeLibrería de Componentes con Claude Code

3–6 weeks · Sergio (primary) + Claude Code3-6 semanas · Sergio (principal) + Claude Code

This is the main build phase. All components are defined in Figma (#18 Design System) following Atomic Design (atoms, molecules, organisms, templates, pages). Claude reads the Figma via Figma MCP and implements matching React components in #1 Native Shell. No components are created outside of what is defined in the Figma.Esta es la fase de construcción principal. Todos los componentes están definidos en Figma (#18 Design System) siguiendo Atomic Design (átomos, moléculas, organismos, plantillas, páginas). Claude lee el Figma via Figma MCP e implementa componentes React en #1 Native Shell. No se crean componentes fuera de lo definido en el Figma.

How Claude Code works in this phase:Cómo trabaja Claude Code en esta fase:

  • Spec → component:Spec → componente: Give Claude Code a description from this spec (e.g., "build ToolAccordion with 4 states: queued/running/success/error, using design tokens from globals.css") → it generates the full TypeScript componentDarle a Claude Code una descripción de este spec → genera el componente TypeScript completo
  • Figma → React:Figma → React: Claude reads the Figma component via Figma MCP and generates all variants automatically with matching props and statesClaude lee el componente en Figma via Figma MCP y genera todas las variantes automáticamente con props y estados que coinciden
  • Accessibility audit:Auditoría de accesibilidad: "Review this component for WCAG AA compliance and fix any issues" — Claude Code runs the audit inline"Revisa este componente para cumplimiento WCAG AA y arregla los problemas"

Velocity benchmark:Benchmark de velocidad: A senior engineer without AI: 1 component/week (design + code + test + docs). With Claude Code: 1 component/day. 25 core components in 5 weeks instead of 25 weeks. This is the 5x leverage.Un ingeniero senior sin IA: 1 componente/semana. Con Claude Code: 1 componente/día. 25 componentes core en 5 semanas en lugar de 25. Este es el apalancamiento 5x.

5

First Real Screen → Test with SellersPrimera Pantalla Real → Test con Sellers

1 week · Full team1 semana · Equipo completo

Assemble the Coach View (the 70/30 split screen) using the built components and tokens. Show it to 3 real sellers. At this point the brand is real — not a Figma mockup, not a code spec, but a running Electron application with real brand tokens, real components, and real data. Collect feedback. Iterate.Ensamblar el Coach View (pantalla 70/30) usando los componentes y tokens construidos. Mostrárselo a 3 sellers reales. En este punto la marca es real — no un mockup de Figma, no un code spec, sino una aplicación Electron corriendo con tokens de marca reales, componentes reales y datos reales.

Figma vs Code — When to Use EachFigma vs Código — Cuándo Usar Cada Uno

This is the most common source of wasted effort in early-stage product design. The answer depends on what you're deciding, not on preference.Esta es la fuente más común de esfuerzo desperdiciado en diseño de producto en etapas tempranas. La respuesta depende de qué estás decidiendo, no de preferencia.

TaskTarea Use Figma?¿Usar Figma? WhyPor qué
Logo explorationExploración de logoYes — requiredSí — requeridoBezier curves, vector editing, proportions — impossible to do well in codeCurvas bezier, edición vectorial — imposible hacerlo bien en código
Color palette validationValidación de paleta de colorYes — fastSí — rápidoSeeing colors in context (on dark bg, next to text) is faster in Figma than spinning up codeVer colores en contexto es más rápido en Figma que arrancar el código
Typography testingTesting de tipografíaYes — fastSí — rápidoFont pairing decisions are visual, not technical. Figma + Google Fonts is 10x faster than code for thisDecisiones de pares de fuentes son visuales. Figma + Google Fonts es 10x más rápido que código para esto
User flow diagramsDiagramas de flujo de usuarioOptionalOpcionalCan also use FigJam, Miro, or paper. The flow is the output, not the toolTambién se puede usar FigJam, Miro o papel. El flujo es el output, no la herramienta
Individual component designDiseño de componente individualOccasionallyOcasionalmenteOnly for complex components (confirmation dialog, onboarding flow). Simple components: just build in code with Claude CodeSolo para componentes complejos. Simples: construir directo en código con Claude Code
Component libraryLibrería de componentesYes — source of truthSí — fuente de verdadFigma (#18 Design System) is the single source of truth following Atomic Design. Claude reads via Figma MCP and implements matching React components. No components created outside FigmaFigma (#18 Design System) es la fuente única de verdad siguiendo Atomic Design. Claude lee via Figma MCP e implementa componentes React. No se crean componentes fuera del Figma
Design tokensDesign tokensNo — live in tokens.jsonNo — viven en tokens.jsonFigma Variables exist but are secondary. The tokens.json → CSS pipeline is the real systemFigma Variables existen pero son secundarias. El pipeline tokens.json → CSS es el sistema real
Full screen prototypesPrototipos de pantalla completaNo — build in ElectronNo — construir en ElectronA running Electron app with real data is a better prototype than any Figma mockup. With Claude Code, the delta in effort is smallUna app Electron corriendo con datos reales es mejor prototipo que cualquier mockup de Figma

Time to World-Class Brand — Realistic EstimateTiempo para Marca de Clase Mundial — Estimado Realista

Phase 1Fase 1

2d

Brand workshopBrand workshop

Phase 2Fase 2

5d

Visual identityIdentidad visual

Phase 3Fase 3

2d

Tokens → codeTokens → código

Phase 4Fase 4

6w

Component libraryLibrería componentes

Phase 5Fase 5

1w

First real screenPrimera pantalla real

Total: ~8 weeks from zero to a brand that rivals Linear or Cursor. The bottleneck is Phase 2 (finding a designer) and Phase 4 (component build). Everything else is decisions + Claude Code automation.Total: ~8 semanas de cero a una marca que rivaliza con Linear o Cursor. El cuello de botella es la Fase 2 (encontrar diseñador) y la Fase 4 (construcción de componentes). Todo lo demás son decisiones + automatización de Claude Code.

§14 · REFERENCIAS

References — Figma, OS Design Systems & Desktop Apps Referencias — Figma, Design Systems de SO y Apps Desktop

The authoritative sources every world-class desktop app is built on: Apple's Human Interface Guidelines, Microsoft Fluent Design, how the best companies use Figma, what Figma Community files to download today, and visual references of the exact apps Shopilot should emulate as a macOS Electron product. Las fuentes autoritativas sobre las que se construye toda app desktop de clase mundial: Apple Human Interface Guidelines, Microsoft Fluent Design, cómo las mejores empresas usan Figma, qué archivos de Figma Community descargar hoy, y referencias visuales de las apps exactas que Shopilot debe emular como producto Electron macOS.

Apple Human Interface Guidelines (HIG)

developer.apple.com/design/human-interface-guidelines · The bible for macOS app designLa biblia del diseño de apps macOS

Every app that feels "native" on macOS — Arc, Cursor, Notion, Linear — follows Apple's HIG. Not as rules, but as a foundation. Understanding HIG tells you why certain things feel right on Mac and wrong on Windows, and what Shopilot must do to feel like a first-class macOS citizen. Cada app que se siente "nativa" en macOS — Arc, Cursor, Notion, Linear — sigue el HIG de Apple. No como reglas, sino como base. Entender el HIG explica por qué ciertas cosas se sienten bien en Mac y mal en Windows.

6 Core HIG Principles — and what they mean for Shopilot6 Principios HIG — y qué significan para Shopilot

1 · Aesthetic Integrity

The app's visual appearance and behavior must be consistent with its purpose. A data tool (Shopilot) should look precise and professional — not playful. Applies to: spacing consistency, typography alignment, color restraint.La apariencia visual y comportamiento deben ser consistentes con el propósito. Una herramienta de datos (Shopilot) debe verse precisa y profesional. Aplica a: consistencia de espaciado, alineación tipográfica, restricción de color.

2 · Consistency

Use standard macOS controls and terminology where possible. Users already know what a sidebar, toolbar, and panel are on Mac. Don't reinvent them — use them. Shopilot's window chrome (title bar, traffic lights, resize handle) must behave as users expect.Usar controles y terminología estándar de macOS donde sea posible. Los usuarios ya saben qué es un sidebar, toolbar y panel en Mac. El chrome de ventana de Shopilot debe comportarse como esperan.

3 · Direct Manipulation

Users should feel they're directly controlling the content on screen. For Shopilot: clicking an ASIN row should immediately feel responsive. Dragging, hovering, and focusing must have immediate visual feedback (≤100ms).Los usuarios deben sentir que controlan directamente el contenido en pantalla. Para Shopilot: hacer clic en una fila ASIN debe sentirse inmediatamente responsivo. Hover y foco deben tener respuesta visual inmediata (≤100ms).

4 · Feedback

Every action must acknowledge the user. Shopilot specifics: button press = visual depress + sound (optional). Loading = progress indicator, not frozen UI. AI thinking = animated cursor ▊ or pulse ···. Error = banner with next action, not silent failure.Cada acción debe reconocer al usuario. Botón = depresión visual. Carga = indicador de progreso. IA pensando = cursor animado. Error = banner con siguiente acción.

5 · User Control

Users — not the app — initiate actions. The AI coach can suggest, but must not act without confirmation on irreversible actions. HIG says: "people should always be in control." This is the origin of Shopilot's reversibility system.Los usuarios — no la app — inician acciones. El coach AI puede sugerir, pero no debe actuar sin confirmación en acciones irreversibles. Esta es la base del sistema de reversibilidad de Shopilot.

6 · Metaphors

Use familiar real-world concepts. Shopilot uses the "coach" metaphor — a trusted advisor who sees the same screen you do and gives guidance. This is why the sidebar is positioned like a coach standing next to you: right side, always visible, never blocking the main view.Usar conceptos reales familiares. Shopilot usa la metáfora del "coach" — un asesor de confianza que ve la misma pantalla. Por eso el sidebar está a la derecha, siempre visible, sin bloquear la vista principal.

macOS Patterns that Shopilot must implement correctlyPatrones macOS que Shopilot debe implementar correctamente

PatternPatrón HIG specSpec HIG Shopilot implementationImplementación Shopilot
Traffic lightsRed/Yellow/Green at 12px diameter, 8px gap, 20px from leftFrameless window + titleBarStyle:'hiddenInset' preserves native buttons
SidebarMin width 220px, vibrancy background, grouped sections with headersShopilot right sidebar 320px — deviates intentionally (coach, not nav)
ToolbarHeight 52px, icon + label, unified with title bar on macOS 11+Tab bar (marketplace switcher) sits at top of left pane, height 40px
Menu barEvery Mac app has native menu bar: File, Edit, View, Window, HelpElectron: Menu.setApplicationMenu() — must exist, even if minimal
Keyboard shortcutsCmd+W close, Cmd+Q quit, Cmd+, preferences — always expectedMust register all standard Mac shortcuts + Shopilot custom (Cmd+K = chat)
System colorsUse NSColor system colors that adapt to dark/light automaticallyIn Electron: CSS env(--system-background-color) or manual token switch
Focus ringBlue ring 3px at system accent color — do NOT remove, required for a11yOverride with brand accent color ring, same shape — never remove entirely

Reference: What Arc Browser takes from HIGReferencia: Lo que Arc Browser toma del HIG

Arc uses native macOS vibrancy for its sidebar, native traffic lights at the exact HIG position, native context menus via NSMenu, native keyboard shortcut conventions, and the native font stack (SF Pro) for all system-level text. Where Arc deviates from HIG is intentional and branded: the tab bar is vertical instead of horizontal, the command bar replaces the URL bar, the sidebar IS the app chrome. Deviation from HIG is a product decision — but you must know the rules before you break them.Arc usa vibrancy nativa de macOS para su sidebar, traffic lights en la posición exacta del HIG, menús contextuales nativos, convenciones de teclado nativas, y SF Pro para todo el texto del sistema. Donde Arc se desvía del HIG es intencional y de marca: la barra de tabs es vertical, la barra de comandos reemplaza la URL. La desviación del HIG es una decisión de producto — pero debes conocer las reglas antes de romperlas.

Microsoft Fluent Design System 2

fluent2.microsoft.design · Windows 11 design languageLenguaje de diseño Windows 11

Shopilot targets macOS first, but Windows build comes in Sprint 11-12. Fluent Design 2 is the official design system for Windows 11 apps. Understanding it now prevents a costly redesign later — and it informs several patterns (Acrylic material, Mica background) that translate beautifully to dark Electron apps on both platforms. Shopilot apunta a macOS primero, pero el build de Windows viene en Sprint 11-12. Fluent Design 2 es el design system oficial para apps Windows 11. Entenderlo ahora previene un rediseño costoso después.

5 Fluent Design Principles5 Principios de Fluent Design

Light

Light as a design element — Reveal highlight: a subtle glow appears under the cursor on interactive elements. Creates depth without shadows. In Electron: CSS radial-gradient on mousemove.La luz como elemento de diseño — Reveal highlight: brillo sutil bajo el cursor en elementos interactivos. En Electron: CSS radial-gradient en mousemove.

Depth

Layers at different Z-levels with Acrylic (frosted glass) and Mica (wallpaper-blended background) materials. For Shopilot: the glass-card pattern directly adopts this — backdrop-filter: blur() is Electron's Acrylic.Capas en diferentes niveles Z con materiales Acrílico (cristal esmerilado) y Mica. Para Shopilot: el patrón glass-card adopta esto — backdrop-filter: blur() es el Acrílico de Electron.

Motion

Connected animations — elements travel between states instead of disappearing and reappearing. Fluent easing: cubic-bezier(0.1, 0.9, 0.2, 1). Used by VS Code, Microsoft Edge, Teams.Animaciones conectadas — los elementos viajan entre estados en lugar de desaparecer y reaparecer. Easing Fluent: cubic-bezier(0.1, 0.9, 0.2, 1).

Material

Acrylic: backdrop-filter: blur(30px) saturate(180%) — used for sidebars, flyouts, menus. Mica: wallpaper color extracted and used as tint in app chrome. Both create sense of app being part of the OS.Acrílico: backdrop-filter: blur(30px) saturate(180%) — para sidebars, flyouts, menús. Mica: color del fondo del escritorio extraído como tinte en el chrome de la app.

Scale

Design for multiple device types. In Shopilot's context: design for minimum 900×600px window, scale gracefully to 2560×1440 (UltraWide). Touch targets minimum 44×44px even on desktop (for touch-screen Windows laptops).Diseñar para múltiples tipos de dispositivos. Contexto Shopilot: mínimo 900×600px, escalar a 2560×1440. Touch targets mínimo 44×44px incluso en desktop.

Fluent Typography — Segoe UI VariableTipografía Fluent — Segoe UI Variable

Windows 11 uses Segoe UI Variable — a variable font that covers all weights and optical sizes. On Windows, Electron apps that use Inter or system-ui automatically map to Segoe UI Variable. No action needed for the font on Windows builds.Windows 11 usa Segoe UI Variable — fuente variable que cubre todos los pesos. En Windows, apps Electron que usan Inter o system-ui mapean automáticamente a Segoe UI Variable.

Fluent Type Ramp (Windows 11):Escala tipográfica Fluent (Windows 11):

  • Caption · 12px · Regular
  • Body · 14px · Regular
  • Body Strong · 14px · Semibold
  • Subtitle · 20px · Semibold
  • Title · 28px · Semibold
  • Title Large · 40px · Semibold
  • Display · 68px · Semibold

Key difference vs Apple HIG:Diferencia clave vs Apple HIG:

Apple HIG uses 17pt as base body size (SF Pro at 17pt = Inter at ~14px). Fluent uses 14px body. On Windows, everything feels slightly larger. If you design for macOS at 13px body text, Windows will look right at 14px. Build token --body-size to switch per platform.Apple HIG usa 17pt como base (SF Pro 17pt = Inter ~14px). Fluent usa 14px body. En Windows, todo se ve ligeramente más grande. Construir el token --body-size para cambiar por plataforma.

How the Best Companies Use FigmaCómo Usan Figma las Mejores Empresas

figma.com · The industry standard for design — and how to use it efficientlyEl estándar de la industria para diseño — y cómo usarlo eficientemente

Figma is not a drawing tool — it's a design system management platform. Companies like Vercel, Linear, Airbnb, and Shopify use Figma as their source of truth for visual decisions, but NOT for everything. Understanding what they put in Figma vs what they build directly in code is what separates efficient teams from slow ones. Figma no es una herramienta de dibujo — es una plataforma de gestión de design systems. Empresas como Vercel, Linear, Airbnb y Shopify usan Figma como fuente de verdad para decisiones visuales, pero NO para todo.

The 5 ways top companies use FigmaLas 5 formas en que las mejores empresas usan Figma

01

Figma Variables = Design Tokens (the right way)Figma Variables = Design Tokens (la forma correcta)

Since Figma 2023, Variables replace Styles for colors, spacing, radii, and typography. Variables in Figma map 1:1 to CSS custom properties. The best companies (Vercel, Shopify, Atlassian) define their entire token system in Figma Variables, then export to JSON using the "Variables Import/Export" plugin (free). This JSON becomes the tokens.json that feeds Style Dictionary.Desde Figma 2023, Variables reemplaza Styles para colores, espaciado, radios y tipografía. Variables en Figma mapean 1:1 a propiedades CSS custom. Las mejores empresas definen su sistema de tokens en Figma Variables, luego exportan a JSON usando el plugin "Variables Import/Export". Este JSON se convierte en el tokens.json que alimenta Style Dictionary.

Figma Variable group → CSS output:Grupo de Variables Figma → output CSS:

color/brand/primary → --color-brand-primary: #F97316

spacing/4 → --spacing-4: 16px

radius/lg → --radius-lg: 8px

02

Auto Layout = Responsive Components that match CSS FlexboxAuto Layout = Componentes Responsivos que coinciden con CSS Flexbox

Figma's Auto Layout mirrors CSS Flexbox exactly. When a designer builds a button with Auto Layout (direction, gap, padding, alignment), it translates directly to a Tailwind class. This is how Linear, Vercel, and Shopify achieve zero friction between design and code: the designer thinks in flex terms, the developer writes flex terms.El Auto Layout de Figma refleja CSS Flexbox exactamente. Cuando un diseñador construye un botón con Auto Layout, se traduce directamente a una clase de Tailwind. Así Linear, Vercel y Shopify logran cero fricción entre diseño y código.

In Figma Auto Layout:En Figma Auto Layout:

Direction: Horizontal

Gap: 8px

Padding: 10px 16px

Align: Center

In Tailwind CSS:En Tailwind CSS:

flex

gap-2

px-4 py-2.5

items-center

03

Component Properties = Variant SystemComponent Properties = Sistema de Variantes

Top companies define every component with Properties (variant=primary/secondary/ghost, size=sm/md/lg, state=default/hover/disabled/loading). This creates a single source of truth for all component states. In Figma, you see all variants in one frame. In code, this maps to props. The designer and developer speak the same language.Las mejores empresas definen cada componente con Properties (variante=primary/secondary/ghost, tamaño=sm/md/lg, estado=default/hover/disabled/loading). Esto crea una fuente de verdad para todos los estados. El diseñador y el desarrollador hablan el mismo idioma.

Button component properties:Propiedades del componente Button:

variant: primary | secondary | ghost | danger | outline | link

size: sm | md | lg

state: default | hover | focus | disabled | loading

icon: none | left | right | only

04

Dev Mode = the handoff from designer to Claude CodeDev Mode = el handoff del diseñador a Claude Code

Figma Dev Mode (free for 1 viewer) lets developers inspect every design decision: exact pixel values, spacing, CSS properties, and exported assets. The workflow for Shopilot: designer finalizes a complex component in Figma → developer opens Dev Mode → copies the exact values into a prompt for Claude Code: "Build this component using these exact specs from Figma Dev Mode: [paste]." Claude Code generates the TypeScript in seconds.Figma Dev Mode permite a los desarrolladores inspeccionar cada decisión de diseño: valores exactos en píxeles, espaciado, propiedades CSS, y assets exportados. El flujo para Shopilot: diseñador finaliza componente → desarrollador abre Dev Mode → pega valores exactos en prompt para Claude Code.

The Claude Code + Figma prompt template:Template de prompt Claude Code + Figma:

"Build a React TypeScript component for [ComponentName]. Read the Figma component via Figma MCP for exact specs (dimensions, colors, spacing, states, variants). Use design tokens from globals.css. Include all states defined in the Figma component.""Construye un componente React TypeScript para [NombreComponente]. Lee el componente en Figma via Figma MCP para las specs exactas (dimensiones, colores, espaciado, estados, variantes). Usa los design tokens de globals.css. Incluye todos los estados definidos en el componente de Figma."

05

Figma as the single source of truth for all visual componentsFigma como fuente única de verdad para todos los componentes visuales

The Figma file (#18 Design System, core-product-design-system) follows Atomic Design (atoms, molecules, organisms, templates, pages) and is the single source of truth. Claude reads Figma via Figma MCP and implements matching React components in #1 Native Shell. No React components are created outside of what is defined in the Figma. The external design team maintains Figma; the engineering team consumes it.El archivo Figma (#18 Design System, core-product-design-system) sigue Atomic Design (átomos, moléculas, organismos, plantillas, páginas) y es la fuente única de verdad. Claude lee Figma via Figma MCP e implementa componentes React en #1 Native Shell. No se crean componentes React fuera de lo definido en el Figma. El equipo externo de diseño mantiene Figma; el equipo de ingeniería lo consume.

Figma Community Files — Download These TodayArchivos de Figma Community — Descargar Hoy

These are official or highly-used public Figma files from the reference companies. Duplicating them to your Figma account is free. Study how they structure components, Variables, and design systems — this is how the best companies work.Estos son archivos públicos de Figma oficiales o muy utilizados de las empresas de referencia. Duplicarlos a tu cuenta de Figma es gratuito. Estudia cómo estructuran componentes, Variables y design systems.

FileArchivo PublisherEditor What to studyQué estudiar Search in CommunityBuscar en Community
Apple Design ResourcesApple (official)macOS UI components, SF Symbols, HIG spacing"Apple Design Resources macOS"
Microsoft Fluent 2Microsoft (official)Fluent component library, Acrylic, tokens system"Microsoft Fluent 2 Web"
Vercel Design SystemVercel (official)Dark-first tokens, Geist font usage, Storybook link"Vercel Design"
Shadcn/ui Figma KitCommunity (official-ish)How shadcn components map to Figma — the bridge"shadcn ui"
Tailwind CSS UI KitCommunityTailwind spacing / color scales in Figma Variables"Tailwind CSS UI Kit"
Linear App DesignCommunity recreationDark sidebar, speed-first interactions, kbd badges"Linear design system"
Electron UI PatternsCommunityTitle bar, tray, window chrome patterns for Electron"Electron desktop UI"
Figma Variables StarterFigma (official)How to structure Variables for a design system"Variables starter kit Figma"

How to use these files:Cómo usar estos archivos: Don't copy components. Study structure. Look at: how they name Variables (tokens), how they organize component pages, how they document states, what their spacing system looks like. These are the patterns to replicate in Shopilot's Figma file when the brand is decided.No copiar componentes. Estudiar la estructura. Ver: cómo nombran Variables (tokens), cómo organizan páginas de componentes, cómo documentan estados, cómo se ve su sistema de espaciado. Estos son los patrones a replicar en el archivo Figma de Shopilot cuando la marca esté decidida.

Desktop App Visual References — What to EmulateReferencias Visuales de Apps Desktop — Qué Emular

These are the specific macOS Electron apps that Shopilot should study in detail as running software — not in Figma, but as installed apps. Each has a specific pattern Shopilot must adopt or consciously decide to deviate from.Estas son las apps Electron macOS específicas que Shopilot debe estudiar en detalle como software corriendo — no en Figma, sino como apps instaladas. Cada una tiene un patrón específico que Shopilot debe adoptar o decidir conscientemente desviarse.

Cursor — cursor.sh

MOST RELEVANT — study firstMÁS RELEVANTE — estudiar primero

The closest structural reference to Shopilot. Both are: Electron, AI-native, dark-first, split-pane (editor left + chat right). Download and install. Study: how the title bar works, how the chat panel opens/closes, how the AI response streams, how tool calls (terminal runs) are displayed, how the status bar at the bottom shows AI state. This is the gold standard for Shopilot's interaction model.La referencia estructural más cercana a Shopilot. Ambos son: Electron, AI-native, dark-first, split-pane. Descargar e instalar. Estudiar: cómo funciona la title bar, cómo abre/cierra el panel de chat, cómo hace streaming la respuesta AI, cómo se muestran las tool calls, cómo muestra el estado AI en el status bar. Este es el estándar de oro para el modelo de interacción de Shopilot.

Adopt from Cursor:Adoptar de Cursor:

  • Status bar 24px bottom
  • Streaming word-by-word
  • Tool call accordion
  • Thinking indicator

Adapt for Shopilot:Adaptar para Shopilot:

  • Split: code→marketplace
  • Tabs: files→marketplaces
  • Context: project→ASIN

Don't copy:No copiar:

  • Code editor UI
  • File tree sidebar
  • Diff view

Arc Browser — arc.net

The reference for rethinking desktop chrome. Arc proves that you can break HIG conventions (vertical tabs instead of horizontal, sidebar IS the app, no visible URL bar) and still feel native and premium. Study specifically: how Arc handles the title bar with traffic lights + drag region + custom controls in the same 40px zone. This is exactly what Shopilot's top bar needs to solve.La referencia para repensar el chrome de desktop. Arc prueba que puedes romper las convenciones HIG (tabs verticales, sidebar ES la app) y aún sentirte nativo y premium. Estudiar específicamente: cómo Arc maneja la title bar con traffic lights + drag region + controles custom en la misma zona de 40px. Esto es exactamente lo que necesita resolver el top bar de Shopilot.

Key lesson:Lección clave: Arc's sidebar gradient background (multi-color per space) is possible in Electron via CSS linear-gradient on the sidebar container. The space color customization is what makes Arc feel personal — a pattern Shopilot could adopt for marketplace color coding (Amazon=orange, MeLi=yellow, Shopify=green).El gradiente del sidebar de Arc es posible en Electron via CSS. La personalización de color por espacio hace que Arc se sienta personal — un patrón que Shopilot podría adoptar para codificación de colores por marketplace.

Linear — linear.app

The reference for performance as a design value. Every interaction in Linear is under 100ms. Study: the keyboard shortcut system (every action has a shortcut visible in the UI), the command palette (Cmd+K), the sidebar collapse behavior, and most importantly — how Linear handles empty states (no data = inspirational, not depressing). Also study: the data tables. Linear's issue list is the closest reference to Shopilot's ASIN product list.La referencia para el rendimiento como valor de diseño. Cada interacción en Linear es menor de 100ms. Estudiar: el sistema de atajos de teclado, la paleta de comandos (Cmd+K), el comportamiento de colapso del sidebar, los estados vacíos, y las tablas de datos — la lista de issues de Linear es la referencia más cercana a la lista de productos ASIN de Shopilot.

N

Notion — notion.so

The reference for Electron done right at scale (30M+ users). Study: how Notion handles window resizing (the sidebar collapses progressively), how they manage a complex sidebar with nested items without it feeling cluttered, and their hover-reveal interactions (properties appear on hover, not always). Also: Notion's dark mode implementation is one of the cleanest in any Electron app — study how they handle the transition between surface layers.La referencia para Electron bien hecho a escala (30M+ usuarios). Estudiar: cómo maneja el redimensionado de ventana (el sidebar colapsa progresivamente), el sidebar con items anidados sin sentirse abarrotado, interacciones hover-reveal, y la implementación del modo oscuro — una de las más limpias en cualquier app Electron.

</>

VS Code — code.visualstudio.com

THE Electron referenceLA referencia Electron

VS Code is the most used Electron app in the world with 30M+ daily active users. It is the definitive reference for what is possible technically and visually in Electron. Study: the status bar (bottom, 22px, same as Shopilot's 24px), the split pane system, the extension panel (same concept as Shopilot's sidebar), the command palette, and the theming system. VS Code themes are CSS token swaps — identical to what Shopilot's design token system will do. The VS Code GitHub repo is public — the theming architecture is directly applicable.VS Code es la app Electron más usada del mundo con 30M+ usuarios activos diarios. Es la referencia definitiva para lo que es posible en Electron. Estudiar: el status bar (inferior, 22px, similar a los 24px de Shopilot), el sistema de split pane, el panel de extensiones, la paleta de comandos, y el sistema de theming. Los temas de VS Code son intercambios de tokens CSS — idéntico a lo que hará el sistema de tokens de diseño de Shopilot.

Action: Install and study these 5 apps this weekAcción: Instalar y estudiar estas 5 apps esta semana

Cursor

cursor.sh

Arc

arc.net

Linear

linear.app

Notion

notion.so

VS Code

code.visualstudio.com

For each: spend 30 min using it normally, then 30 min inspecting specific patterns (title bar, sidebars, status bar, hover states, loading states, dark mode). Document what you want to adopt, adapt, or avoid. This is the most efficient design research you can do before the brand workshop.Para cada una: 30 min usándola normalmente, luego 30 min inspeccionando patrones específicos (title bar, sidebars, status bar, hover states, loading states, dark mode). Documentar qué adoptar, adaptar o evitar. Esta es la investigación de diseño más eficiente que se puede hacer antes del brand workshop.

Essential Figma Plugins for this WorkflowPlugins Esenciales de Figma para este Workflow

PluginPlugin What it doesQué hace PhaseFase CostCosto
Variables Import/ExportExports Figma Variables to JSON → feeds tokens.jsonPhase 2→3 bridgeFree
Tokens StudioFull design token management in Figma (W3C DTCG format)Phase 2→3 bridge$20/mo
ContrastWCAG AA/AAA contrast checker on any color pair in canvasPhase 2 · color decisionsFree
AbleAccessibility checker — contrast, focus order, WCAG annotationsPhase 4 · component reviewFree
IconifyAll Lucide icons available in Figma — same library as the codePhase 2+ ongoingFree
Figma to CodeExports Figma frames as HTML/Tailwind/React snippetsPhase 4 · component startFree
Color BlindSimulates 8 types of color blindness on any framePhase 2 · color decisionsFree
§14 · FULL-STACK

Full-Stack Design IntegrationIntegración Full-Stack de Diseño

The missing 30%: exact technology stacks, how everything wires together, Claude API integration patterns with real code, what's still undocumented, and 2026 AI-native design methodology. Actionable — not theoretical.El 30% que faltaba: stacks tecnológicos exactos, cómo todo se conecta, patrones de integración Claude API con código real, qué aún está sin documentar, y metodología de diseño AI-native 2026. Accionable — no teórico.

01 · The 6-Layer Stack — How Everything Connects01 · El Stack de 6 Capas — Cómo Todo Se Conecta

LAYER 6 · Quality Gates

Figma ↔ Code consistency review · axe-core a11y · Playwright e2e · PR blocked if component deviates from Figma

LAYER 5 · Claude AI Integration

Anthropic SDK v0.30+ · Messages streaming API · Tool use (36 tools) · Prompt caching · Multi-LLM router

LAYER 4 · Electron App Shell

Electron 33+ · WebContentsView (70%) · React 19 sidebar (30%) · IPC contextBridge · Auto-updater

LAYER 3 · React Component Library

shadcn/ui (Radix primitives) · Figma Atomic Design (#18) · Figma MCP · Tailwind 4 · Framer Motion 11

LAYER 2 · Design Token Pipeline

tokens.json (W3C DTCG) → Style Dictionary 4 → CSS custom properties → tailwind.config.ts → CSS vars

LAYER 1 · Design Spec (This File)

shopilot_v6.html · Single source of truth · Pablo approves · Sergio implements · Mateo owns tokens

Complete Package Manifest

PackageVersionPurposeLayerOwner
@anthropic-ai/sdk^0.30Claude API: streaming, tools, caching5Andrés
electron^33Desktop shell, WebContentsView, IPC4Mateo
react + react-dom^19UI renderer, concurrent features3Sergio
tailwindcss^4Utility CSS, token consumption3Sergio
@radix-ui/react-*latestAccessible primitives (via shadcn)3Sergio
shadcn/uiCLI 2.xComponent generator on Radix + Tailwind3Sergio
framer-motion^11Animations: word-stream, slide-up, spring3Sergio
lucide-react^0.43Icon library — 1.5px stroke, currentColor3Sergio
recharts^2Charts only (BSR sparkline, KPI gauge)3Andrés
style-dictionary^4Token transform: JSON → CSS → Tailwind2Mateo
@axe-core/react^4Accessibility audit (WCAG AA)6Sergio
zod^3Tool input/output validation schema5Andrés
zustand^5Agent state machine store3-5Sergio

02 · Design Token Pipeline — tokens.json → Production CSS02 · Pipeline de Tokens — tokens.json → CSS Producción

tokens.json

W3C DTCG format · source of truth

style-dictionary build

design-tokens.css + tailwind-tokens.ts

auto-generated, never edit manually

▶ tokens.json — Full Example (W3C DTCG format)▶ tokens.json — Ejemplo Completo (formato W3C DTCG)
{
  "$schema": "https://design-tokens.org/schema.json",
  "sp": {
    "color": {
      "bg": {
        "base": { "$value": "#0A0A0F", "$type": "color", "$description": "App background — near-black warm" },
        "01":   { "$value": "#0F0F18", "$type": "color" },
        "02":   { "$value": "#14141F", "$type": "color" },
        "03":   { "$value": "#1A1A28", "$type": "color" }
      },
      "orange": {
        "50":  { "$value": "rgba(249,115,22,0.08)", "$type": "color" },
        "500": { "$value": "#F97316", "$type": "color", "$description": "CANDIDATE — replace with decided brand color" },
        "600": { "$value": "#EA6005", "$type": "color" }
      },
      "fg": {
        "100": { "$value": "#F4F4F6", "$type": "color", "$description": "Primary text" },
        "80":  { "$value": "#D4D4E4", "$type": "color" },
        "60":  { "$value": "#A4A4B8", "$type": "color" },
        "40":  { "$value": "#7A7A90", "$type": "color" }
      },
      "success": { "$value": "#22C55E", "$type": "color" },
      "warning": { "$value": "#F59E0B", "$type": "color" },
      "error":   { "$value": "#EF4444", "$type": "color" },
      "info":    { "$value": "#3B82F6", "$type": "color" }
    },
    "space": {
      "g": { "$value": "10px", "$type": "dimension", "$description": "base grid unit" },
      "v": { "$value": "22px", "$type": "dimension", "$description": "vertical rhythm" },
      "4":  { "$value": "4px",  "$type": "dimension" },
      "8":  { "$value": "8px",  "$type": "dimension" },
      "12": { "$value": "12px", "$type": "dimension" },
      "16": { "$value": "16px", "$type": "dimension" },
      "24": { "$value": "24px", "$type": "dimension" },
      "32": { "$value": "32px", "$type": "dimension" }
    },
    "radius": {
      "sm":   { "$value": "4px",    "$type": "dimension" },
      "md":   { "$value": "6px",    "$type": "dimension" },
      "lg":   { "$value": "8px",    "$type": "dimension" },
      "xl":   { "$value": "12px",   "$type": "dimension" },
      "2xl":  { "$value": "16px",   "$type": "dimension" },
      "full": { "$value": "9999px", "$type": "dimension" }
    },
    "duration": {
      "instant": { "$value": "80ms",  "$type": "duration" },
      "fast":    { "$value": "150ms", "$type": "duration" },
      "normal":  { "$value": "200ms", "$type": "duration" },
      "slow":    { "$value": "350ms", "$type": "duration" },
      "scenic":  { "$value": "500ms", "$type": "duration" }
    }
  }
}
▶ style-dictionary.config.mjs — Build Config▶ style-dictionary.config.mjs — Configuración de Build
// style-dictionary.config.mjs
import StyleDictionary from 'style-dictionary';

export default {
  source: ['tokens.json'],
  platforms: {
    // → CSS custom properties (--sp-color-orange-500)
    css: {
      transformGroup: 'css',
      files: [{
        destination: 'src/styles/design-tokens.css',
        format: 'css/variables',
        options: { selector: ':root', outputReferences: true }
      }]
    },
    // → Tailwind config (for extend.colors, extend.spacing)
    tailwind: {
      transformGroup: 'js',
      files: [{
        destination: 'src/styles/tailwind-tokens.ts',
        format: 'javascript/esm'
      }]
    }
  }
}

// Run: npx style-dictionary build
// Output:
//   src/styles/design-tokens.css   ← import in main.tsx
//   src/styles/tailwind-tokens.ts  ← import in tailwind.config.ts
▶ tailwind.config.ts — Token Consumption▶ tailwind.config.ts — Consumo de Tokens
// tailwind.config.ts
import type { Config } from 'tailwindcss'

const config: Config = {
  content: ['./src/**/*.{ts,tsx}'],
  theme: {
    extend: {
      colors: {
        // Reference CSS custom properties so Tailwind + Style Dictionary stay in sync
        'sp-bg-base': 'var(--sp-color-bg-base)',
        'sp-orange':  'var(--sp-color-orange-500)',
        'sp-fg-100':  'var(--sp-color-fg-100)',
        'sp-success': 'var(--sp-color-success)',
        'sp-warning': 'var(--sp-color-warning)',
        'sp-error':   'var(--sp-color-error)',
      },
      spacing: {
        'sp-g': 'var(--sp-space-g)',   // 10px
        'sp-v': 'var(--sp-space-v)',   // 22px
      },
      borderRadius: {
        'sp-sm': 'var(--sp-radius-sm)',
        'sp-lg': 'var(--sp-radius-lg)',
        'sp-xl': 'var(--sp-radius-xl)',
      },
      fontFamily: {
        'display': ['Inter Display', 'Inter', 'sans-serif'],
        'mono':    ['JetBrains Mono', 'Fira Code', 'monospace'],
      },
      transitionDuration: {
        'sp-fast':   'var(--sp-duration-fast)',
        'sp-normal': 'var(--sp-duration-normal)',
        'sp-slow':   'var(--sp-duration-slow)',
      }
    }
  },
  plugins: []
}
export default config

03 · shadcn/ui Integration with Shopilot Tokens03 · Integración shadcn/ui con Tokens Shopilot

shadcn/ui is NOT a component library — it's a code generator. Components are copied into your repo and 100% customizable. Use it for accessibility-correct primitives, then override with Shopilot tokens.shadcn/ui NO es una librería — es un generador de código. Los componentes se copian a tu repo y son 100% personalizables. Úsalo para primitivas accesibles, luego sobrescribe con los tokens Shopilot.

▶ Setup Commands + globals.css Override▶ Comandos de Setup + Override globals.css
# 1. Init shadcn (say YES to CSS variables, pick Neutral base)
npx shadcn@latest init

# When prompted:
# ✓ Style: Default
# ✓ Base color: Neutral  (we override below)
# ✓ CSS variables: YES   (critical — this is how tokens flow in)
# ✓ src directory: YES

# 2. Add the components Shopilot needs (never add all at once)
npx shadcn@latest add button
npx shadcn@latest add dialog
npx shadcn@latest add dropdown-menu
npx shadcn@latest add tooltip
npx shadcn@latest add select
npx shadcn@latest add scroll-area
npx shadcn@latest add collapsible     # ← ToolAccordion base
npx shadcn@latest add badge
npx shadcn@latest add separator
npx shadcn@latest add progress        # ← ContextWindowBar

# 3. Override src/app/globals.css with Shopilot tokens:
@import 'design-tokens.css';          /* Style Dictionary output */

@layer base {
  :root {
    /* Map shadcn vars → Shopilot tokens */
    --background:       240 6% 7%;    /* #0A0A0F */
    --foreground:       240 6% 96%;   /* #F4F4F6 */
    --card:             240 6% 10%;   /* #14141F */
    --card-foreground:  240 6% 87%;   /* #D4D4E4 */
    --popover:          240 6% 10%;
    --popover-foreground: 240 6% 96%;
    --primary:          25 95% 53%;   /* CANDIDATE: #F97316 orange — replace once brand color decided */
    --primary-foreground: 0 0% 100%;
    --secondary:        240 4% 16%;   /* #28283C */
    --secondary-foreground: 240 6% 87%;
    --muted:            240 4% 16%;
    --muted-foreground: 240 6% 47%;   /* #7A7A90 */
    --accent:           25 95% 53%;   /* orange accent */
    --accent-foreground: 0 0% 100%;
    --destructive:      0 84% 60%;    /* #EF4444 */
    --border:           240 6% 20%;   /* rgba(255,255,255,.06) approx */
    --input:            240 6% 16%;
    --ring:             25 95% 53%;   /* orange focus ring */
    --radius:           0.5rem;       /* 8px = --sp-radius-lg */
  }
}

# Result: shadcn components automatically use Shopilot colors.
# Edit src/components/ui/button.tsx to change size tokens to sp-* vars.

Which shadcn components to use vs build customCuáles usar de shadcn vs construir custom

ComponentSourceWhy
Button (6 variants)shadcn base → customizeRadix provides correct focus/disabled states; we override styles
Dialog / Confirmation Cardshadcn Dialog → customizeRadix handles focus trap + aria-modal correctly; style from scratch
Tooltipshadcn Tooltip → light overridePositioning engine is complex; only needs color/font token override
Select / Dropdownshadcn → heavy customizeRadix handles keyboard nav; we rebuild visual completely
Tool AccordionBUILD CUSTOMStreaming state machine, badge states, JSON viewer — too specific
ReAct StreamBUILD CUSTOMWord-by-word animation, thinking pulse — unique to Shopilot
KPI CardBUILD CUSTOMJetBrains Mono + delta badge + sparkline — fully custom
Context Window Barshadcn Progress → customizeStacked segments on top of Progress primitive
Data Tableshadcn Table + TanStack TableTanStack handles sort/filter; shadcn provides base HTML table
Proactive Suggestion CardBUILD CUSTOMAnimated slide-up, dismiss swipe, max-2-simultaneous logic
Date Pickerreact-day-picker (NEVER BUILD)Calendar UI is complex; use library, override tokens only
Charts (sparkline, gauge)recharts (NEVER BUILD)Math-heavy; only override colors and font

04 · Claude API Streaming Integration — Real Implementation04 · Integración Claude API Streaming — Implementación Real

The complete chain from user input → Claude API → word-by-word UI animation → tool execution display. Every piece has a specific design pattern.La cadena completa desde input del usuario → Claude API → animación palabra-a-palabra → display de tool execution. Cada pieza tiene un patrón de diseño específico.

Agent State Machine

idle
user_typing
submitting
thinking ···
streaming ▊
tool_running
awaiting_confirm

done
|
error
|
credit_exhausted

thinking ···

CSS: opacity 0.4→1→0.4, 1.2s infinite · NO elapsed time shown · Status bar: animated dot

streaming ▊

Each word: fadeIn 80ms ease-out · Cursor: blinking 0.6s · NO skeleton, NO spinner

awaiting_confirm

Confirmation card slide-up 250ms spring · Input disabled · Backdrop dims 20%

▶ useStream.ts — Complete React Hook Implementation▶ useStream.ts — Implementación Completa del React Hook
// src/hooks/useStream.ts
import { useState, useCallback, useRef } from 'react';
import Anthropic from '@anthropic-ai/sdk';
import { shopilotTools } from '@/tools/definitions';
import { useAgentStore } from '@/stores/agentStore';

type AgentState =
  | 'idle' | 'thinking' | 'streaming'
  | 'tool_running' | 'awaiting_confirm' | 'done' | 'error';

interface StreamMessage {
  role: 'user' | 'assistant';
  content: string;
}

export function useStream() {
  const [agentState, setAgentState] = useState<AgentState>('idle');
  const [words, setWords] = useState<string[]>([]);
  const [currentToolCall, setCurrentToolCall] = useState<string | null>(null);
  const abortRef = useRef<AbortController | null>(null);
  const { addTool, updateTool } = useAgentStore();

  const stream = useCallback(async (messages: StreamMessage[]) => {
    abortRef.current = new AbortController();
    setWords([]);
    setAgentState('thinking');

    // NOTE: In Electron, Anthropic SDK runs in main process.
    // Renderer sends via IPC → main runs SDK → streams back via IPC.
    // This hook shows the renderer-side pattern.

    try {
      const client = new Anthropic(); // API key from env via contextBridge

      const stream = await client.messages.stream({
        model: 'claude-opus-4-6',
        max_tokens: 8192,
        system: SHOPILOT_SYSTEM_PROMPT,
        messages,
        tools: shopilotTools,
        // Prompt caching — reduces cost 60-80% on repeated context:
        betas: ['prompt-caching-2024-07-31'],
      });

      for await (const event of stream) {
        switch (event.type) {

          case 'content_block_start':
            if (event.content_block.type === 'text') {
              setAgentState('streaming');
            }
            if (event.content_block.type === 'tool_use') {
              setAgentState('tool_running');
              const toolId = event.content_block.id;
              const toolName = event.content_block.name;
              setCurrentToolCall(toolName);
              addTool({ id: toolId, name: toolName, state: 'running', startMs: Date.now() });
            }
            break;

          case 'content_block_delta':
            if (event.delta.type === 'text_delta') {
              // Word-by-word: split on spaces, animate each word
              const newWords = event.delta.text.split(/(?<=\s)/);
              setWords(prev => [...prev, ...newWords]);
            }
            break;

          case 'content_block_stop':
            setCurrentToolCall(null);
            break;

          case 'message_stop':
            setAgentState('done');
            break;
        }
      }
    } catch (err) {
      if ((err as Error).name !== 'AbortError') {
        setAgentState('error');
      }
    }
  }, [addTool]);

  const abort = useCallback(() => {
    abortRef.current?.abort();
    setAgentState('idle');
    setWords([]);
  }, []);

  return { agentState, words, currentToolCall, stream, abort };
}
▶ StreamingText.tsx — Word-by-Word Animation Component▶ StreamingText.tsx — Componente de Animación Palabra a Palabra
// src/components/StreamingText.tsx
import { motion, AnimatePresence } from 'framer-motion';

interface StreamingTextProps {
  words: string[];
  isStreaming: boolean;
}

// Design rule: each word fades in at 80ms.
// Cursor blinks at 0.6s cycle when streaming.
// No skeleton, no placeholder, no loading bar.
export function StreamingText({ words, isStreaming }: StreamingTextProps) {
  return (
    <div className="text-sp-fg-100 text-sm leading-relaxed">
      {words.map((word, i) => (
        <motion.span
          key={i}
          initial={{ opacity: 0 }}
          animate={{ opacity: 1 }}
          transition={{ duration: 0.08, ease: 'easeOut' }}  // 80ms per word
        >
          {word}
        </motion.span>
      ))}
      {/* Blinking cursor — only while streaming */}
      <AnimatePresence>
        {isStreaming && (
          <motion.span
            initial={{ opacity: 1 }}
            animate={{ opacity: [1, 0, 1] }}
            transition={{ duration: 0.6, repeat: Infinity, ease: 'linear' }}
            className="inline-block ml-0.5 font-mono text-sp-orange"
            style={{ fontFamily: 'JetBrains Mono' }}
          >
            ▊
          </motion.span>
        )}
      </AnimatePresence>
    </div>
  );
}

// ThinkingPulse — shown when agent is thinking (no tokens yet)
export function ThinkingPulse() {
  return (
    <motion.span
      animate={{ opacity: [0.4, 1, 0.4] }}
      transition={{ duration: 1.2, repeat: Infinity, ease: 'easeInOut' }}
      className="text-sp-fg-40 font-mono text-sm"
    >
      ···
    </motion.span>
  );
}

★ Prompt Caching — 60-80% Cost Reduction★ Prompt Caching — Reducción de Costo 60-80%

Mark static parts of context with cache_control: {type: 'ephemeral'} — system prompt + marketplace context + seller profile. TTL: 5 minutes. Every subsequent request in a session reuses cached tokens. At 1,000 sellers × 50 requests/day = $4,800/mo → $960/mo with caching.Marca las partes estáticas del contexto con cache_control: {type: 'ephemeral'} — system prompt + contexto marketplace + perfil del vendedor. TTL: 5 minutos. Cada request subsiguiente en sesión reutiliza tokens cacheados. A 1,000 vendedores × 50 requests/día = $4,800/mes → $960/mes con caching.

05 · Tool Call UI — Visual Patterns for 36 Tools05 · UI de Tool Calls — Patrones Visuales para 36 Tools

This was the biggest gap identified in the audit: the spec described the tool accordion but never showed the complete visual spec or component code. Fixed here.Este era el mayor gap identificado en el audit: el spec describía el tool accordion pero nunca mostraba el spec visual completo ni el código del componente. Corregido aquí.

Live Tool Accordion States

get_competitor_prices running · ASIN B08XYZABC
analyze_buy_box ✓ 847ms

Input

{ "asin": "B08XYZABC",
  "marketplace": "amazon_mx" }

Output

{ "buybox_winner": "us",
  "our_share": 0.78,
  "competitors": 3 }
update_product_price ⚠ IRREVERSIBLE
ASINB08XYZABC
Current price$24.99
New price$22.49
Projected impact+18% Buy Box probability
sync_inventory ✗ API timeout · retry?
▶ ToolAccordion.tsx — Complete Component▶ ToolAccordion.tsx — Componente Completo
// src/components/ToolAccordion.tsx
import { motion } from 'framer-motion';
import { Check, X, AlertTriangle, Loader2 } from 'lucide-react';

type ToolState = 'queued' | 'running' | 'success' | 'error' | 'awaiting_confirm';
type RiskLevel = 'read_only' | 'reversible' | 'irreversible';

interface ToolAccordionProps {
  id: string;
  name: string;
  state: ToolState;
  riskLevel: RiskLevel;
  durationMs?: number;
  input?: Record<string, unknown>;
  output?: Record<string, unknown>;
  errorMessage?: string;
  onConfirm?: () => void;
  onCancel?: () => void;
}

const stateConfig = {
  queued:           { icon: null, color: '#7A7A90', bg: 'rgba(122,122,144,0.06)', border: 'rgba(122,122,144,0.2)' },
  running:          { icon: 'spin', color: '#3B82F6', bg: 'rgba(59,130,246,0.05)', border: 'rgba(59,130,246,0.2)' },
  success:          { icon: 'check', color: '#22C55E', bg: 'rgba(34,197,94,0.05)', border: 'rgba(34,197,94,0.2)' },
  error:            { icon: 'x', color: '#EF4444', bg: 'rgba(239,68,68,0.05)', border: 'rgba(239,68,68,0.2)' },
  awaiting_confirm: { icon: 'warn', color: '#F59E0B', bg: 'rgba(245,158,11,0.05)', border: 'rgba(245,158,11,0.25)' },
};

export function ToolAccordion({ id, name, state, riskLevel, durationMs, input, output, errorMessage, onConfirm, onCancel }: ToolAccordionProps) {
  const cfg = stateConfig[state];
  const isDestructive = riskLevel === 'irreversible';

  return (
    <motion.div
      layout
      initial={{ opacity: 0, y: 4 }}
      animate={{ opacity: 1, y: 0 }}
      transition={{ duration: 0.2, ease: [0.16, 1, 0.3, 1] }}
      style={{ background: cfg.bg, border: `1px solid ${cfg.border}`, borderRadius: 10 }}
    >
      <details>
        <summary style={{ display: 'flex', alignItems: 'center', gap: 10, padding: '10px 16px', cursor: 'pointer', listStyle: 'none' }}>
          {/* State icon */}
          {state === 'running' && <Loader2 size={14} color={cfg.color} className="animate-spin" />}
          {state === 'success' && <Check size={14} color={cfg.color} strokeWidth={2.5} />}
          {state === 'error'   && <X size={14} color={cfg.color} strokeWidth={2.5} />}
          {state === 'awaiting_confirm' && <AlertTriangle size={14} color={cfg.color} />}

          <span style={{ fontSize: 12, fontWeight: 500, color: '#D4D4E4', flex: 1 }}>{name}</span>

          {/* Right badges */}
          {isDestructive && (
            <span style={{ fontSize: 9, fontWeight: 700, color: '#EF4444', textTransform: 'uppercase', letterSpacing: '0.1em' }}>
              IRREVERSIBLE
            </span>
          )}
          {state === 'success' && durationMs && (
            <span style={{ fontSize: 10, fontFamily: 'JetBrains Mono', color: cfg.color }}>
              ✓ {durationMs}ms
            </span>
          )}
          {state === 'error' && (
            <span style={{ fontSize: 10, fontFamily: 'JetBrains Mono', color: cfg.color }}>
              ✗ Error
            </span>
          )}
        </summary>

        {/* Expanded content */}
        <div style={{ padding: '0 16px 12px', borderTop: '1px solid rgba(255,255,255,0.05)' }}>
          {/* Confirmation card for irreversible actions */}
          {state === 'awaiting_confirm' && (
            <ConfirmationCard input={input} riskLevel={riskLevel} onConfirm={onConfirm} onCancel={onCancel} />
          )}
          {/* JSON viewer for success/error */}
          {(state === 'success' || state === 'error') && (
            <JsonViewer input={input} output={output} error={errorMessage} />
          )}
        </div>
      </details>
    </motion.div>
  );
}

06 · Previously Undocumented Patterns — Now Complete06 · Patrones Previamente Indocumentados — Ahora Completos

Empty States — 8 Variants

No ASINs Yet

First-run. CTA: "Add your first product"

No Search Results

Show query, suggest correction

All Caught Up

No pending actions. Positive reinforcement.

Sync Pending

Data loading from marketplace. Progress bar.

Not Connected

OAuth not done. CTA: "Connect marketplace"

No History

Audit log empty. "Actions will appear here"

Credits Zero

Agent paused. Upgrade CTA dominant.

No Reports

Pro feature gate. "Available in Pro plan"

Empty State Rules:

  • ① Icon: 32px, colored by context (orange=action, blue=info, green=success, red=error)
  • ② Title: max 4 words, sentence case, no period
  • ③ Description: 1 line, explains why + what to do next
  • ④ CTA: only if there's a direct action. Never show CTA on "All Caught Up"
  • ⑤ Never show empty state while loading — show progress instead

Error State Taxonomy — 3 Categories

Category A · RecoverableUser can fix

API timeout, validation error, missing field. Amber border + icon. Show specific message + retry button. Auto-retry after 3s with countdown.Timeout API, error de validación, campo faltante. Borde ámbar + ícono. Mensaje específico + botón retry. Auto-retry después de 3s con countdown.

Amazon API timeout · Retrying in 3s
Category B · UnrecoverableNeeds human intervention

Auth revoked, account suspended, critical DB error. Red banner. Explain what happened, what user must do. No auto-retry. Support link if relevant.Auth revocado, cuenta suspendida, error crítico de DB. Banner rojo. Explica qué pasó, qué debe hacer el usuario. Sin auto-retry. Link de soporte si es relevante.

MeLi credentials expired · Re-authentication required
Category C · Informational BlockNot an error, but blocked

Rate limit, credit exhausted, feature not in plan. Blue info banner. Calm tone. Clear path forward (upgrade, wait, etc). Agent pauses gracefully.Rate limit, créditos agotados, feature no en el plan. Banner azul informativo. Tono calmado. Camino claro hacia adelante (upgrade, esperar, etc). El agente pausa graciosamente.

Credits exhausted · Coach paused · Resets on March 6

Accessibility — WCAG AA Contrast Ratios

TextBackgroundRatioWCAG AAUse
#F4F4F6#0A0A0F15.8:1PASS AAAPrimary text on bg
#A4A4B8#0A0A0F7.1:1PASS AASecondary text
#F97316#0A0A0F5.8:1PASS AAOrange on bg
#FFFFFF#F973163.2:1PASS (large only)White on orange btn
#7A7A90#0A0A0F4.2:1PASS AATertiary text
#54546A#0A0A0F2.8:1FAIL — captions onlyPlaceholder, metadata (decorative)
#22C55E#0A0A0F7.0:1PASS AASuccess text
#EF4444#0A0A0F4.8:1PASS AAError text

⚠ #54546A fails WCAG AA — use only for decorative metadata (timestamps, IDs) where context is clear. Never for interactive or status-critical text.

08 · Development Methodology — How the 4-Person Team Ships08 · Metodología de Desarrollo — Cómo el Equipo de 4 Shippe

Phase 1 — Weeks 1-3

Design-in-Code · Ship Tokens + Atoms

  • → Mateo: tokens.json + Style Dictionary setup
  • → Sergio: Electron shell + React sidebar skeleton
  • → Sergio: shadcn/ui init + Button + Input + Badge
  • → Andrés: Anthropic SDK + IPC bridge + tool router
  • → Pablo: this spec + design review on each PR

Deliverable: Electron window opens · sidebar renders · "Hello Claude" works

Phase 2 — Weeks 4-6

AI Agent Loop · Core Organisms

  • → Sergio: StreamingText + ThinkingPulse + ToolAccordion
  • → Sergio: ConfirmationCard + RollbackPanel
  • → Andrés: 10 core tools (price read, competitor, buy box)
  • → Mateo: Figma MCP integration + token pipeline setup
  • → Pablo: design review of all organisms against Figma specs

Deliverable: Full coach loop working · tool calls visible · confirm/cancel works

Phase 3 — Weeks 7-10

Data + Quality Gates

  • → Andrés: DataTable + KPI cards + Context Window Bar
  • → Andrés: Audit log + proactive suggestion cards
  • → Mateo: axe-core a11y audit + Figma ↔ Code consistency gates
  • → Sergio: Empty states + error states all variants
  • → Pablo: Beta onboarding + first seller feedback

Deliverable: Beta-ready · full data views · quality gates passing

Design Review Checklist — Every UI PRChecklist de Design Review — Cada PR de UI

All numbers use JetBrains Mono (fontFamily: font-mono)
All colors use sp-* CSS vars or Tailwind sp-* classes
Interactive elements have focus ring (shadow-focus)
Bilingual: EN + ES spans on all user-visible text
No hardcoded colors — only var(--sp-*) or Tailwind tokens
Animations use sp-dur-* + sp-ease-* tokens
IRREVERSIBLE actions have red badge + 2-step confirm
Component matches Figma spec (#18 Design System)

Design System Maturity Score — Current State

Tokens defined
100%
Atoms built
0% (v1)
Molecules built
0% (v1)
Claude integration
0% (v1)
Spec coverage
98%
Brand defined
100%
2026 trends
100%

Next: Sergio starts Week 1 → tokens.json file + shadcn init + Button component. Mateo sets up Style Dictionary. The spec is ready. Now we build.

§14 · WORLD-CLASS

World-Class Design StrategyEstrategia de Diseño de Clase Mundial

The gap between "good design" and "world-class" is not more components — it's precision at the product level: how screens compose, how the competition fails, what makes sellers trust the UI instantly, and the 20 invisible decisions that separate tier-1 products.La diferencia entre "buen diseño" y "clase mundial" no es más componentes — es precisión a nivel de producto: cómo se componen las pantallas, cómo falla la competencia, qué hace que los vendedores confíen en la UI al instante, y las 20 decisiones invisibles que separan los productos de primer nivel.

01 · B2B Product UX References — Not Brand Books, Product Patterns01 · Referencias de UX de Producto B2B — No Brand Books, Patrones de Producto

These 8 products are referenced for their UX patterns — specific interaction and layout decisions Shopilot should adopt. Different from Section 15 which analyzed brand identity.Estos 8 productos se referencian por sus patrones de UX — decisiones específicas de interacción y layout que Shopilot debe adoptar. Diferente a la Sección 15 que analizó identidad de marca.

S

Stripe Dashboard

Gold standard for B2B SaaS data presentation

dashboard.stripe.com ↗

Metric Card Pattern

Big mono number (42px) → small label above → percentage delta below with color · No chart inside the card (chart is separate) · Hover reveals tooltip with exact timestamp

→ Shopilot: KPI cards follow this exact hierarchy

Activity Timeline

Every action logged with: type icon + description + amount + timestamp · Clickable row reveals full detail · Infinite scroll (no pagination) · Timeline = trust

→ Shopilot: Audit Log follows this pattern exactly

Developer-first but accessible

API keys visible in UI · Raw JSON expandable · But non-technical users see clean summaries · Same data, two perspectives on same screen

→ Shopilot: Tool accordion shows summary + expandable JSON

Linear

Keyboard-first B2B product · Speed as primary UX feature

linear.app ↗

Speed as Marketing

Linear measured and published their p50/p95 load times. "Built for speed" is a design statement. Every interaction under 100ms feels intentional. This is a UX strategy, not just engineering.

→ Shopilot: Measure + display model response time. Make it a feature.

Status as Color Only

No status text ("In Progress", "Done") on lists — just colored dots. Experts read the color map in <0.5s. Power users trained to read color grids in one glance.

→ Shopilot: Buy Box status = orange dot, not "You have buy box: YES"

Kbd Shortcut Badges Everywhere

Every action in dropdown shows keyboard shortcut. This teaches users → makes them faster → makes them dependent → reduces churn. Shortcut visibility = retention feature.

→ Shopilot: All dropdowns show Cmd+K, Cmd+1, Esc shortcuts inline

Figma

Complex tool with zero cognitive friction · Panel architecture

3-Panel Information Architecture

Left: navigation/layers · Center: work surface · Right: contextual properties. This is the master pattern for complex tools. The content always has max space. Panels are tools, not content.

→ Shopilot: Marketplace=center, Sidebar=right panel. Left nav deferred to v2.

Context-Sensitive Right Panel

The right panel changes based on what's selected. Select a component → see its properties. Click away → see general settings. Sidebar in Shopilot should adapt to the marketplace page being viewed.

→ Shopilot Phase 2: sidebar context = active ASIN on marketplace page

Multiplayer Visual Cues

Other users visible as colored cursors. Multi-tab shows who's looking at what. In Shopilot context: the AI "cursor" — the coach's attention indicator (which tool it's running, what data it's looking at right now).

→ Shopilot: "Coach is analyzing ASIN B08XYZ" status in sidebar header

Datadog

The benchmark for monitoring dashboards · Density without chaos

Time as Primary Axis

Every metric in Datadog is a time series. The X axis is always time. This trains users to think in trends, not point-in-time snapshots. For Shopilot: Buy Box % over 30d is more actionable than Buy Box % right now.

→ Shopilot: All KPIs have 7d/30d sparklines. Point values only for current.

Alert Integration in Charts

Threshold lines appear ON charts, not in separate alerts. When a metric crosses a line, the chart background changes color. Alert IS the chart. No separate notification panel for threshold breaches.

→ Shopilot: Price threshold line on competitor chart. Red zone when below margin.

Faceted Filtering

Left sidebar has real-time faceted filters that update counts as you click. Tags/dimensions are first-class citizens. For sellers: filter by marketplace + category + status simultaneously. Update counts in real-time.

→ Shopilot Phase 2: ASIN table with faceted filters (marketplace, category, status)

Arc Browser

The best Electron app built — breaks every browser convention and wins

Sidebar IS the App Chrome

Arc moved ALL chrome (tabs, bookmarks, history) to the left sidebar. The content area is 100% undecorated. This is the insight: in Electron, the sidebar is where your app lives. The WebContentsView is sacred space.

→ Shopilot: sidebar has zero visual decoration except the chat + tool calls + status

Custom Title Bar Done Right

Arc's frameless window with custom controls that feel MORE native than native. The traffic light buttons are in their correct position, drag region is the entire top bar, full-screen transitions are perfect.

→ Shopilot: frameless + native traffic lights + 32px drag region + tab bar after

Command Bar as Primary Navigation

Arc's Cmd+T opens a search-everything command bar. This is the #1 power user feature. Arc trained millions of users to navigate entirely by keyboard. Once users find the command bar, they never use menus again.

→ Shopilot: Cmd+K opens command palette: "analyze B08XYZ", "reprice all", "show alerts"

BBG

Bloomberg Terminal

The extreme end of data density done right · Reference for seller data density

Density as Expertise Signal

Bloomberg is deliberately dense. It signals: "this is for professionals." The density IS the marketing — it makes users feel expert just by using it. Shopilot sellers are professionals. They can handle density. Don't dumb it down.

→ Shopilot: Don't simplify competitor tables. Show all 8 columns. Professionals want data.

Color = Directionality Only

Bloomberg uses green/red ONLY for up/down price movement. No other meaning. Nothing else is green or red. This absolute discipline means users process market data at a glance without thinking about color meaning.

→ Shopilot: #22C55E = price up / won Buy Box. #EF4444 = price down / lost Buy Box. Nothing else.

Monospace as Alignment

All Bloomberg data is monospace because financial data must align vertically. The $1,234.56 must be perfectly below $98.76 and $12,300.00. Misalignment breaks scanning. Monospace is structural, not decorative.

→ Shopilot: JetBrains Mono for all numbers is Bloomberg discipline applied to e-commerce.

N

Notion

Progressive disclosure master · Slash commands as interaction metaphor

Slash Command = AI Interaction

Notion's "/" opens inline commands. Claude Code uses the same pattern. This is now the universal AI interaction metaphor. Shopilot's chat input should support "/" for quick actions: "/reprice", "/analyze", "/report".

→ Shopilot: "/" in chat input opens quick-action palette with 36 tools

Properties Reveal on Hover

Notion rows show only essential data by default. Hover reveals additional properties. This keeps lists clean while preserving data access. For Shopilot: ASIN rows show Name + Price + Buy Box. Hover reveals: SKU, inventory, last sync.

→ Shopilot: ASIN row hover reveals secondary metrics (expandable hover card)

Everything is a Block

Notion's single abstraction ("a block") unifies all content types. For Shopilot: every item in the sidebar is "a message" — user message, assistant message, tool call, confirmation card, proactive suggestion. Same base type, different renders.

→ Shopilot: MessageBlock type with discriminated union: text | tool | confirm | proactive

Intercom

Fin AI + human handoff · The original AI product with trust signals

AI vs Human Indicator

Intercom shows whether Fin AI or a human is responding. The AI has a bot icon; human has a photo. For Shopilot: the coach always shows "Powered by Claude Opus 4.6" + current model. Users trust labeled AI more than unlabeled AI.

→ Shopilot: sidebar header shows model name + version. Always visible, never hidden.

Proactive + Reactive in Same UI

Intercom shows proactive campaigns AND reactive inbox in same interface. Two modes: outbound (AI initiates) and inbound (user initiates). For Shopilot: the coach can initiate conversations ("I noticed X") and respond to queries.

→ Shopilot: proactive suggestion cards (coach-initiated) + chat input (user-initiated) in same sidebar

Context Panel Always Visible

Intercom inbox shows customer context alongside every conversation — purchase history, previous tickets, plan level. The agent never has to "look it up." For Shopilot: the coach always has seller profile + marketplace data visible in context.

→ Shopilot: context bar top of sidebar shows active marketplace + seller plan + top ASIN count

02 · Screen Compositions — What Each Main Screen Actually Looks Like02 · Composiciones de Pantalla — Cómo se Ven Realmente las Pantallas Principales

The biggest gap in the spec before this section. Components are defined; screens are not. These CSS mockups show exact proportions, component placement, and information hierarchy.El mayor gap del spec antes de esta sección. Los componentes están definidos; las pantallas no. Estos mockups CSS muestran proporciones exactas, ubicación de componentes y jerarquía de información.

Screen 01 · Coach View — Main Application Screen (70/30)

Amazon
MeLi
Shopify
sellercentral.amazon.com/inventory
Today's Sales
$3,847
Buy Box %
78%
Active ASINs
47
Alerts
3
ASIN / Product
Price
Buy Box
BSR
Stock
Wireless Headphones Pro
B08XYZABC1
$24.99
#1,234
23
USB-C Cable 6ft
B07ABCDEF2
$12.49
#4,891
147
Phone Stand Adjustable
B09GHIJKL3
$18.99
#2,102
312
Shopilot Coach
claude-opus-4-6
Context window
You
Why did I lose the buy box on B07ABCDEF2?
get_competitor_prices ✓ 312ms
Shopilot
You lost the buy box because TechDeals_MX repriced to $11.49$1.00 below your price. They have 4.8★ vs your 4.6★.
Suggested action
Reprice to $11.29 → projected Buy Box recovery: 73%
Ask your coach... ⌘K
Amazon connected
247 credits

Title bar

32px · frameless · traffic lights · tab bar after buttons · drag region

Marketplace 70%

WebContentsView · URL bar 28px · content scrolls natively · no interference

Sidebar 30%

React · header 36px · context bar · chat scroll · input sticky · status 20px

Status bar

20px · left: marketplace status · right: credit balance (JetBrains Mono)

Screen 02 · Dashboard View — Sidebar in "Overview" Mode

Shopilot Coach
03/05 · 14:32
Revenue 7d
$24.8K
▲ +12%
Buy Box
78%
▼ -4%
Alerts
3
review now
Top Opportunities
USB-C Cable 6ft lost Buy Box
-$1.00 vs TechDeals_MX
Phone Stand inventory critical
23 units · ~4 days
3 ASINs underpriced vs market
+$180/mo potential
Ask your coach anything... ⌘K

Dashboard mode: sidebar replaces chat history with KPI summary + opportunity list when agent is idle. Chat input always present. Click any opportunity → coach activates and analyzes it.

03 · Competitive Design Matrix — Why Shopilot Looks Different03 · Matriz Competitiva de Diseño — Por Qué Shopilot Se Ve Diferente

The existing seller tools (Helium 10, SellerBoard, Jungle Scout, Repricer.com) were designed in 2012-2018. They solve the right problems with completely wrong design language for 2026. This is Shopilot's visual competitive moat.Las herramientas actuales para vendedores fueron diseñadas en 2012-2018. Resuelven los problemas correctos con un lenguaje de diseño completamente equivocado para 2026. Esta es la ventaja competitiva visual de Shopilot.

Dimension Helium 10 SellerBoard Jungle Scout Repricer.com Shopilot ★
Design Era 2018 · SaaS purple 2015 · Excel aesthetic 2017 · Consumer green 2013 · Corporate blue 2026 · AI-native dark
Primary BG #6B4FBB purple #FFF white #1D6F42 green #1B4F8A navy #0A0A0F near-black
AI Integration Bolt-on chatbot (2024) None AI keywords only Rule-based only AI-first · agent loop · 36 tools
Number Display Default browser font Arial/Helvetica Proxima Nova regular System serif JetBrains Mono always
Dark Mode ✗ Light only ✗ Light only ⚠ Toggle (half done) ✗ Light only ✓ Dark-first · identity
Desktop App ✗ Web only ✗ Web only ✗ Web only ✗ Web only ✓ Electron · native feel
Reversibility ✗ Not labeled ✗ Not labeled ✗ Not labeled ⚠ Confirm dialog only ✓ REVERSIBLE/IRREVERSIBLE · rollback tokens
Typography system 1-2 fonts, no scale System fonts Proxima Nova only System fonts Inter Display + JetBrains Mono · full scale
Context awareness ✗ Manual switch ✗ Manual switch ✗ Manual switch ✗ Manual switch ✓ Coach sees active marketplace page
Perceived quality Tool (functional) Spreadsheet Consumer app Legacy SaaS Precision instrument · Bloomberg meets Claude

★ The Core Design Insight★ El Insight Central de Diseño

Every competitor was designed by engineers for engineers. Shopilot is designed by a seller who has used all of these tools and knows exactly what they get wrong. The dark + professional + monospace + AI-native aesthetic isn't a trend — it's the natural design language of a serious professional tool for 2026. This is the same design evolution that happened in finance (Bloomberg → Robinhood), in code (Eclipse → VS Code → Cursor), and in project management (JIRA → Linear).Cada competidor fue diseñado por ingenieros para ingenieros. Shopilot es diseñado por un vendedor que ha usado todas estas herramientas y sabe exactamente qué hacen mal. La estética dark + profesional + monospace + AI-native no es una tendencia — es el lenguaje de diseño natural de una herramienta profesional seria para 2026.

04 · Emotional Design Map — From First Install to Power User04 · Mapa de Diseño Emocional — Del Primer Install al Power User

0s · First Impression

"This looks serious"

Dark canvas opens. Orange accent. Shopilot logo. No splash screen, no loading animation. App IS the window.

Design: near-black bg · frameless · logo mark visible · zero clutter

30s · Onboarding

"This is fast"

5-step wizard. Step 1: value prop. Step 2: OAuth in 30s. Step 3: language/category. Skip from step 3.

Design: progress dots · one action per step · CTA dominant · NO form fields until step 3

2min · First Tool Call

"The AI knows my data"

Coach runs first analysis unprompted. Tool accordion shows real API calls to their real store. This is the trust moment.

Design: tool accordion opens · real ASIN names · JetBrains Mono numbers · "From Amazon API"

5min · Aha Moment

"I didn't know this"

Coach surfaces an insight the seller didn't have: "You lost Buy Box on 8 ASINs in the last 24h. Here's why." This is the aha moment.

Design: proactive card slides up · specific numbers · one-click action · orange CTA

Day 1 · First Win

"It actually worked"

Price was changed. Buy Box % goes up. Confirmation with actual before/after. The coach says "Buy Box recovered to 91%."

Design: success state · green + orange celebrate · audit log entry · rollback still visible

Week 1 · Habit

"I check this every morning"

Dashboard view shows overnight changes. 3 opportunities queued. Seller opens app and acts on them before coffee is done.

Design: dashboard mode · opportunities sorted by $$$ impact · 1-click actions · <60s daily ritual

Month 1 · Expert

"I can't operate without this"

Power user. Knows Cmd+K, "/" commands. Audit log is their source of truth. Coach Memory has learned their preferences.

Design: keyboard shortcuts visible · command palette muscle memory · history as data

Designed Delight Moments — The Details That StickMomentos de Deleite Diseñados — Los Detalles que Se Quedan

First Buy Box Win Celebration

When buy box goes from ✗ to ✓, the status dot pulses green 3x with scale(1.4). Subtle. Not a confetti explosion. Professional delight.

Typing Indicator Before Coach Responds

The ··· thinking pulse with "Shopilot is analyzing your store" appears immediately when user sends message. Never a blank moment.

Rollback Success State

When rollback completes, the audit log entry shows "↩ Reversed · 2.3s ago" in green. The system communicates "you're safe, it worked."

Coach Memory Acknowledgment

When coach uses seller's stored preference, it says "(using your saved preference: always protect margins >30%)". Shows it's paying attention.

Competitor Detected Alert

When a new seller lists on one of your ASINs, the proactive card appears with their name, price, and rating. Feels like having eyes everywhere.

Credit Milestone

When seller uses their 100th credit, a discreet banner: "100 actions taken · Avg response: 1.2s · $847 in revenue impacts attributed." Numbers build pride.

05 · E-Commerce Domain Visual Patterns — What No Other Design System Has05 · Patrones Visuales Específicos de E-Commerce — Lo que Ningún Otro Design System Tiene

Generic design systems cover buttons and inputs. Shopilot needs patterns specific to e-commerce seller intelligence. These are the domain-specific visual components that make the product feel built BY a seller.Los design systems genéricos cubren botones e inputs. Shopilot necesita patrones específicos de inteligencia de vendedores e-commerce. Estos son los componentes visuales específicos del dominio que hacen que el producto se sienta construido POR un vendedor.

Buy Box Indicator — 4 States

78% You own it
0% Lost · Fix now
34% Contested
No data yet

Rule: Buy Box % is ALWAYS JetBrains Mono. Color = status only. No text labels on list view (dot only). Labels on detail view.

Price Delta Display — Competitor Comparison

$24.99 You
+$2.50 ▲
$22.49 TechDeals_MX
★ Winner
$25.99 ElectroMX
-$1.00 ▼

You row = orange bar. Winner row = highlighted. Relative bar shows price position visually. Delta shown as absolute + direction. Never percentage-only.

BSR Trend Sparkline — Inline in Table

#1,234
▲ improving
#4,891
▼ declining

BSR: LOWER = BETTER (rank #1 = bestseller). Sparkline: green slope = improving (going toward #1). ALWAYS show direction word, not just number. Color shadow band adds weight without legend.

Inventory Health Grid — Portfolio View

312
days
147
days
23
days
4
days
89
days
31
days
201
days
2
days

Each cell = one ASIN. Color = stock health (green >60d / amber 15-60d / red <15d). Number = days remaining. Glanceable portfolio status. No labels needed — color + number is sufficient.

06 · Color Blindness Safety — Accessible for All Sellers06 · Seguridad para Daltonismo — Accesible para Todos los Vendedores

~8% of men and ~0.5% of women have red-green color blindness. For Shopilot, this means Buy Box won (green) vs lost (red) may be indistinguishable to ~1 in 12 male sellers. The fix: never use color alone for meaning. Always pair with icon, text, or shape.~8% de hombres y ~0.5% de mujeres tienen daltonismo rojo-verde. Para Shopilot esto significa que Buy Box ganado (verde) vs perdido (rojo) puede ser indistinguible para ~1 de cada 12 vendedores hombres. La solución: nunca usar solo el color para transmitir significado.

Deuteranopia (Red-Green Blind)

Most common: green-blind. Reds appear brownish-yellow. Greens appear similar to orange.

Buy Box Won
Simulated
Buy Box Lost
Simulated

Problem: green and red dots look identical to deuteranopes. Users can't distinguish Buy Box won vs lost by color alone.

Fix: Shape + Color (WCAG 1.4.1)

Never use color alone. Always pair color with shape, icon, or text pattern.

78% Won — checkmark confirms
0% Lost — X confirms
34% Contested — dash confirms

Solution: ✓/✗/— icons work even without color. Color still helps non-colorblind users scan faster.

Safe Color Pairs (Accessible)

These color combinations are distinguishable under all common color blindness types:

Blue + Orange — universally distinguishable
Blue + Amber — excellent for status pairs
White + Dark — table row contrast (no color needed)

Testing tools: Figma "Color Blind" plugin · Chrome DevTools accessibility panel · coblis.de online simulator

07 · The 20 Invisible Decisions That Make Products World-Class07 · Las 20 Decisiones Invisibles que Hacen los Productos de Clase Mundial

Users can't name these details. But they feel them. A user who says "this just feels premium" is responding to some combination of these 20 decisions. None of them take more than a few hours to implement. All of them matter.Los usuarios no pueden nombrar estos detalles. Pero los sienten. Un usuario que dice "esto simplemente se siente premium" está respondiendo a alguna combinación de estas 20 decisiones. Ninguna toma más de pocas horas de implementar. Todas importan.

① Letter-spacing on headings

-0.03em on h2 makes text look designed, not default. Default tracking = amateur.

② Consistent 4px grid

Every spacing value divisible by 4. Not "16px here, 18px there." Inconsistency is invisible but users sense the chaos as "roughness."

③ Inset shadow on cards

inset 0 1px 0 rgba(255,255,255,.06) adds glass depth. Without it, dark cards look flat and dead.

④ Transition on color changes

transition: background 150ms ease, color 150ms ease on all interactive elements. Instant color changes feel abrupt and cheap.

⑤ Border on focus, not outline

Browser default outline is ugly. Replace with box-shadow: 0 0 0 2px rgba(249,115,22,.5). Same a11y benefit, premium look.

⑥ Disabled ≠ invisible

Disabled elements at 50% opacity tell users "this exists but you can't use it yet." Not display:none. Visibility + opacity = correct pattern.

⑦ Line-height on body text = 1.5

Dense data UIs are tempting to set to 1.2. Don't. AI-generated text needs 1.5 minimum for readability. Chat messages need 1.6.

⑧ Cursor: pointer on interactive divs

If it's clickable, it needs cursor: pointer. Forgetting this on tool accordions or proactive cards breaks the interaction expectation.

⑨ Tabular nums on ALL numbers

font-variant-numeric: tabular-nums makes numbers align in columns. Without it, a table of prices is unreadable.

⑩ Scrollbars styled or hidden

Default scrollbars look terrible on dark UIs. Either hide with ::-webkit-scrollbar or make them thin + dark. Visible ugly scrollbars = unfinished product.

⑪ No horizontal scroll on mobile

Electron windows can be resized smaller than expected. overflow-x:hidden on body, overflow-x:auto on tables only.

⑫ Semantic HTML elements

Use <button> not <div onclick>. <time> for timestamps. <output> for live AI output. Semantic = better a11y + better dev experience.

⑬ Will-change on animated elements

will-change: transform, opacity on sliding cards and streaming text. Moves animation to GPU. Eliminates jank at 60fps.

⑭ Error messages explain what to DO

"Error 403" = terrible. "Your Amazon credentials expired. Click Reconnect to re-authorize in 30 seconds." = world-class. Every error has a next step.

⑮ Timestamps in user timezone

Never show UTC. Use Intl.DateTimeFormat(locale, {timeZone}). "2:34 PM" not "19:34 UTC". Sellers check timestamps constantly.

⑯ Number formatting by locale

MeLi sellers in Mexico: $1,847.50 not 1847,50 MXN. Use Intl.NumberFormat. Wrong number format breaks trust immediately.

⑰ Empty inputs have placeholder text

Chat input: "Ask your coach about any ASIN, competitor, or pricing decision..." Not "Type here" or blank. Placeholder teaches the product's power.

⑱ Correct text cursor in inputs

Input fields: cursor: text. Buttons: cursor: pointer. Disabled: cursor: not-allowed. Every cursor state must be right.

⑲ Data source attribution

Below every KPI: "From Amazon Seller Central API · Synced 4 min ago" in 10px gray. This is the invisible trust builder. Users who see source attribution trust the numbers more.

⑳ Reduce motion for vestibular

@media (prefers-reduced-motion: reduce) { * { animation-duration: 0.01ms; } } — respects OS accessibility settings. Required for WCAG 2.3.3.

18

Production Readiness — Critical Gaps Listo para Producción — Brechas Críticas

30-point audit results · 14 gaps identified · All HIGH/MEDIUM severity specs Resultados de auditoría 30 puntos · 14 brechas identificadas · Specs severidad ALTA/MEDIA

4
HIGH — Missing
Updates · Persistence · GDPR · Observability
6
MEDIUM — Missing
Support · Demo · Multi-acct · Desktop OS
4
PARTIAL — Incomplete
E2E tests · Virtualization · Tray · Menus

This section was generated from a systematic 30-point codebase audit. Each sub-section contains actionable implementation specs. Address HIGH items before public beta. MEDIUM items before v1.0 GA. Esta sección fue generada a partir de una auditoría sistemática de 30 puntos. Cada sub-sección contiene specs de implementación accionables. Resolver ítems HIGH antes del beta público. MEDIUM antes de v1.0 GA.

HIGH 01 · Update Notification System 01 · Sistema de Notificación de Actualizaciones

MISSING

electron-updater is configured for auto-download but the user-facing update experience is completely unspecified. Silent updates break trust — users need to know when and why the app changed. electron-updater está configurado para auto-descarga pero la experiencia de actualización para el usuario no está especificada. Las actualizaciones silenciosas rompen la confianza.

Update State Machine

idle checking available downloading ready restarting

Update Available Modal — Live Spec

Shopilot 1.3.0 available

You have version 1.2.4. Download is ready.

What's new

  • Coach: 3x faster tool execution with parallel calls
  • MercadoLibre: new competitor tracking for MX sellers
  • Fixed: Rollback confirmation not dismissing on success
  • Fixed: Credit balance not updating after top-up
Downloading update… 73%
Implementation — main process (click to expand)
// main/updater.ts
import { autoUpdater } from 'electron-updater';
import { BrowserWindow, dialog } from 'electron';

autoUpdater.autoDownload = true;
autoUpdater.autoInstallOnAppQuit = true;

autoUpdater.on('update-available', (info) => {
  mainWindow.webContents.send('update:available', {
    version: info.version,
    releaseNotes: info.releaseNotes,
  });
});

autoUpdater.on('download-progress', (progress) => {
  mainWindow.webContents.send('update:progress', {
    percent: Math.round(progress.percent),
    bytesPerSecond: progress.bytesPerSecond,
  });
});

autoUpdater.on('update-downloaded', () => {
  mainWindow.webContents.send('update:ready');
});

// IPC handler — user clicks "Restart & Update"
ipcMain.handle('update:install', () => {
  autoUpdater.quitAndInstall(false, true); // isSilent=false, forceRunAfter=true
});

// Check interval: on launch + every 4 hours
autoUpdater.checkForUpdatesAndNotify();
setInterval(() => autoUpdater.checkForUpdatesAndNotify(), 4 * 60 * 60 * 1000);
State UI Pattern Dismissible?
checkingStatus bar dot pulses blue — silentAuto
availableIn-app banner: "New version available. View details"Yes (persists until restart)
downloadingModal with changelog + progress bar (auto-shown)Yes (download continues)
readyModal: "Ready to install. Restart now?" with changelogYes (installs on quit)
errorSilent (logged to Sentry) — do not bother user for update errorsN/A

HIGH 02 · Local Chat Persistence 02 · Persistencia Local del Chat

MISSING

Chat sessions vanish on app restart. No localStorage, no IndexedDB, no Zustand persist spec exists anywhere in the codebase. Sellers who close the app lose all context — a critical trust failure. Las sesiones de chat desaparecen al reiniciar la app. No hay spec de localStorage, IndexedDB, ni Zustand persist en todo el codebase. Los sellers que cierran la app pierden todo el contexto.

Data Model — What to Persist

ChatSession (IndexedDB — shopilot-chat store)

interface ChatSession {
  id: string;           // uuid
  marketplaceId: 'amazon' | 'meli' | 'shopify';
  asin?: string;        // active context when session started
  messages: Message[];  // all messages including tool calls
  createdAt: number;    // unix ms
  updatedAt: number;
  tokenCount: number;   // for context window visualization
  title?: string;       // auto-generated from first user message (truncated 60 chars)
}

Zustand Store — React State Layer

import { create } from 'zustand';
import { persist, createJSONStorage } from 'zustand/middleware';

// Lightweight: only persist session index (not full messages)
// Full messages go to IndexedDB via idb-keyval
const useChatStore = create(persist(
  (set, get) => ({
    sessions: [] as SessionMeta[],      // { id, title, updatedAt, marketplace }
    activeSessionId: null as string | null,
    setActiveSession: (id: string) => set({ activeSessionId: id }),
    addSession: (meta: SessionMeta) =>
      set(s => ({ sessions: [meta, ...s.sessions].slice(0, 100) })), // keep last 100
  }),
  {
    name: 'shopilot-chat-store',
    storage: createJSONStorage(() => localStorage), // session index only
  }
));

// Full messages: idb-keyval (no serialization overhead)
import { get as idbGet, set as idbSet, del as idbDel } from 'idb-keyval';

export const loadSession = (id: string) => idbGet<ChatSession>(`session:${id}`);
export const saveSession = (s: ChatSession) => idbSet(`session:${s.id}`, s);
export const deleteSession = (id: string) => idbDel(`session:${id}`);
Storage What's stored Retention Size limit
localStorageSession index (id, title, timestamp)100 sessions~20 KB
IndexedDBFull message arrays with tool calls90 days, then pruned~50 MB soft cap
safeStorageAPI keys, marketplace credentialsUntil user logoutNegligible
SQLite (main)Audit log, price history, snapshots180 days500 MB max

Session History UI — Sidebar Panel

When chat input is empty: show last 5 sessions as clickable cards below input. Each card: title (auto) + marketplace icon + relative time. Clicking loads the session and resumes context. Pattern adopted from Claude.ai sidebar.

HIGH 03 · GDPR, Data Export & Account Deletion 03 · GDPR, Exportación de Datos y Eliminación de Cuenta

MISSING

Zero documentation of user data download, account deletion, or data retention. Required by GDPR (EU), LGPD (Brazil — critical for MeLi sellers), and expected by Apple App Store Review. Must exist before any public release. Sin documentación de descarga de datos, eliminación de cuenta o retención. Requerido por GDPR (UE), LGPD (Brasil — crítico para sellers de MeLi), y App Store Review. Debe existir antes de cualquier lanzamiento público.

Personal Data Inventory (PII Map)

Data Type Where stored Purpose Retention Exportable?
Email addressSupabase auth.usersAccount identityUntil deletionYes
Marketplace credentialsElectron safeStorage (local)API accessUntil revokeNo (keys)
Chat historyLocal IndexedDBSession continuity90 daysYes (JSON)
Audit logLocal SQLiteRollback & trust180 daysYes (CSV)
Usage telemetryPostHog (cloud)Product analytics24 monthsOn request
Credit transactionsSupabase billingBilling history7 years (legal)Yes (PDF)
Error/crash reportsSentry (cloud)Bug fixing90 daysNo (aggregate)

Data Export Package — ZIP Structure

shopilot-export-{userId}-{YYYYMMDD}.zip
├── README.txt                   # What's in this export, data policy link
├── account/
│   ├── profile.json             # email, plan, created_at, last_login
│   └── billing_history.csv      # date, amount, credits, description
├── chat_history/
│   ├── sessions_index.json      # session metadata (title, date, marketplace)
│   └── session_{id}.json × N    # full message arrays per session
├── audit_log/
│   └── actions.csv              # timestamp, action, asin, old_value, new_value, reversible
└── telemetry_summary.json       # aggregate usage stats (no PII included)

Account Deletion Flow (GDPR Article 17 — Right to Erasure)

  1. User navigates to Settings → Account → "Delete Account"
  2. Modal: "This will permanently delete your account and all data. Export your data first?" with [Export Data] + [Continue to Delete] buttons
  3. Type "DELETE" in text field to confirm (same pattern as Vercel, Supabase)
  4. Server-side: mark account deleted_at → Supabase Edge Function queues hard delete in 30 days (grace period for disputes)
  5. Local: clear all IndexedDB stores + localStorage + SQLite + safeStorage keys on next launch
  6. Confirmation email: "Your Shopilot account will be permanently deleted on {date+30d}. Cancel: {link}"

HIGH 04 · Observability & Error Tracking (Sentry) 04 · Observabilidad y Seguimiento de Errores (Sentry)

PARTIAL

Sentry is mentioned in the stack but sampling rates, PII filtering, event taxonomy, and performance monitoring thresholds are not specified. Under-instrumented apps have silent failures in production. Sentry aparece en el stack pero sin tasas de muestreo, filtrado de PII, taxonomía de eventos ni umbrales de performance. Las apps sub-instrumentadas tienen fallos silenciosos en producción.

Sentry Configuration Spec (click to expand)
// renderer/main.tsx — Sentry init
import * as Sentry from '@sentry/electron/renderer';

Sentry.init({
  dsn: process.env.VITE_SENTRY_DSN,
  environment: process.env.NODE_ENV,
  release: app.getVersion(),

  // Sampling — aggressive in dev, conservative in prod
  tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0,
  profilesSampleRate: 0.05, // CPU profiling — 5% of transactions

  // PII Scrubbing — NEVER send user content to Sentry
  beforeSend(event) {
    // Strip message content (chat messages may contain business data)
    if (event.extra?.messages) delete event.extra.messages;
    if (event.extra?.prompt) delete event.extra.prompt;
    // Strip marketplace credentials from breadcrumbs
    event.breadcrumbs?.values?.forEach(crumb => {
      if (crumb.data?.token) crumb.data.token = '[Filtered]';
      if (crumb.data?.apiKey) crumb.data.apiKey = '[Filtered]';
    });
    return event;
  },

  // Integrations
  integrations: [
    Sentry.browserTracingIntegration(),
    Sentry.replayIntegration({
      maskAllText: true,       // block all text from session replay
      blockAllMedia: true,
    }),
  ],
});

Custom Event Taxonomy

Event Name Trigger Severity Alert?
tool_execution_failedTool returns error after 3 retriesWarningNo
irreversible_action_takenPrice change / inventory update confirmedInfoNo
credit_exhaustedBalance hits 0WarningYes (Slack)
marketplace_auth_expiredAPI returns 401/403ErrorYes (Slack)
claude_api_errorAnthropic API returns 5xxErrorYes (PagerDuty)
ipc_bridge_timeoutIPC call > 5s with no responseCriticalYes (PagerDuty)
rollback_failedRollback tool returns errorCriticalYes (PagerDuty)

Performance Thresholds (alert if exceeded)

  • App cold start: > 3s → warning
  • IPC round-trip: > 500ms → warning
  • Tool execution: > 10s → log
  • First token latency: > 2s → log
  • Chat render FPS: < 30fps → log

User Context (always attach)

Sentry.setUser({
  id: userId,     // NOT email
  plan: 'pro',
});
Sentry.setContext('marketplace', {
  active: 'amazon',
  region: 'US',
});
// NEVER set: email, apiKey, sellerId

MEDIUM 05 · In-App Support & Help Center 05 · Soporte In-App y Centro de Ayuda

MISSING

No help center, FAQ panel, or support chat specified. B2B desktop apps need accessible support without leaving the app. Pattern: ? button in status bar → slide-over panel with search + articles + live chat. Sin centro de ayuda, panel FAQ ni chat de soporte especificado. Las apps B2B necesitan soporte accesible sin salir de la app.

Support Entry Points

  • Status bar ? — always visible, 24px tall, right corner. Opens help slide-over. Free plan gets async email; Pro gets live chat widget (Crisp or Intercom).
  • Error recovery banners — Type A errors include "Need help?" link that pre-fills support form with error context.
  • Keyboard: Cmd+Shift+? — opens help slide-over from anywhere in the app.
  • First run onboarding — 3-step coach intro with "How does this work?" expandable FAQ inline.

Help Panel Anatomy

Help & Support
🔍 Search articles…

Popular Articles

How does the Buy Box coach work?
Connecting your Amazon Seller account
Understanding credits and billing

Integration Recommendation: Crisp.chat (not Intercom)

Crisp is $25/mo vs Intercom's $74/mo minimum. Crisp has a WebView embed that works in Electron without SDK conflicts. For v1: embed Crisp chatbox in the help slide-over WebContentsView. For v2: evaluate Intercom when MRR > $10K.

MEDIUM 06 · Demo & Trial Mode 06 · Modo Demo y Trial

MISSING

No sandbox or mock data strategy exists. New users who haven't connected a marketplace account see a blank app. Every B2B tool that converts well has a demo mode that shows the product's value immediately. No existe estrategia de sandbox o datos de prueba. Los usuarios nuevos sin cuenta marketplace conectada ven una app vacía. Todo B2B que convierte bien tiene un modo demo que muestra el valor del producto inmediatamente.

Demo Data Strategy

// demo/fixtures.ts
export const DEMO_SELLER = {
  marketplace: 'amazon',
  region: 'US',
  storeName: 'Acme Electronics',
  plan: 'pro',
};

export const DEMO_ASINS = [
  { asin: 'B08N5WRWNW', title: 'Wireless Earbuds Pro',
    price: 49.99, buyBox: 78, bsr: 1247, stock: 342 },
  { asin: 'B09G9FPHY6', title: 'USB-C Hub 7-in-1',
    price: 34.99, buyBox: 0, bsr: 891, stock: 12 },  // stock warning
  { asin: 'B0BDJ179PH', title: 'Phone Stand Aluminum',
    price: 19.99, buyBox: 34, bsr: 3401, stock: 98 },
];

export const DEMO_COMPETITORS = {
  'B08N5WRWNW': [
    { seller: 'TechDirect', price: 47.99, bbPercent: 22 },
    { seller: 'ElectroHub', price: 51.99, bbPercent: 0 },
  ],
};

// Demo coach responses — scripted for max "aha moment"
export const DEMO_CHAT_SCRIPT = [
  {
    trigger: 'first_message',
    response: 'I can see your Buy Box win rate dropped 23% this week on B08N5WRWNW. Your main competitor TechDirect lowered their price to $47.99 two days ago. Want me to analyze if repricing to $46.49 would recover the Buy Box while maintaining your margin?',
  },
];

Demo Banner — Persistent Indicator

🎭

Demo Mode

Simulated data — no real changes will be made

Demo Mode Rules

  • • All tool calls return fixture data, never real API
  • • Confirmation dialogs work but action is a no-op
  • • Credits don't decrement (infinite demo credits)
  • • Audit log shows demo actions with 🎭 prefix
  • • "Connect Account" CTA always visible in sidebar
  • • Demo mode auto-activates if no marketplace connected

MEDIUM 07 · Multi-Account Management 07 · Gestión Multi-Cuenta

MISSING

Power sellers operate 2-5 marketplace accounts (Amazon US + MX, MeLi MX + CO). No account switching UI is specified. This is a v1 blocker for agency users and will be requested in the first week of beta. Los sellers avanzados operan 2-5 cuentas de marketplace. No existe UI para cambio de cuenta. Es un bloqueador v1 para usuarios agencia.

Account Data Model

interface MarketplaceAccount {
  id: string;          // uuid
  marketplace: 'amazon' | 'meli' | 'shopify';
  region: string;      // 'US' | 'MX' | 'CO' | etc.
  displayName: string; // "Acme US Store"
  avatarInitials: string; // "AC"
  avatarColor: string; // auto-assigned from palette
  lastSynced: number;
  isDefault: boolean;
  credentialKey: string; // safeStorage key reference
}

// Max accounts per plan:
// Free:  2 accounts
// Pro:   10 accounts
// (encourages Pro upsell for agencies)

Account Switcher UI

Switch Account

AC

Acme US

Amazon US · ✓ active

AM

Acme MX

Amazon MX

+

Add account

Account Switch Behavior

  • Context isolation: chat history, ASIN lists, and audit logs are scoped per account — switching loads the other account's data
  • Keyboard shortcut: Cmd+Shift+A opens account switcher dropdown
  • Status bar: shows active account name truncated to 20 chars + marketplace icon
  • Switch is instant: no reload, React state swap — chat input clears, context bar updates, tab bar highlights appropriate marketplace

MEDIUM 08 · Desktop OS Integration — Missing Specs 08 · Integración con el SO Desktop — Specs Faltantes

PARTIAL

Several Electron desktop OS integration points are specified at a high level but lack implementation detail: single-instance lock, deep link protocol, right-click context menus, tray badge counts, and drag-and-drop. Varios puntos de integración con el SO están especificados a alto nivel pero sin detalle de implementación.

Single-Instance Lock (prevents duplicate windows)
// main/index.ts
const gotTheLock = app.requestSingleInstanceLock();

if (!gotTheLock) {
  app.quit(); // Second instance — quit immediately
} else {
  // First instance: handle second-instance attempt
  app.on('second-instance', (event, commandLine) => {
    if (mainWindow) {
      if (mainWindow.isMinimized()) mainWindow.restore();
      mainWindow.focus();
      // If launched with deep link (e.g., shopilot://auth/callback?code=...)
      const deepLink = commandLine.find(arg => arg.startsWith('shopilot://'));
      if (deepLink) handleDeepLink(deepLink);
    }
  });
}
Deep Link Protocol — shopilot://
// main/index.ts — Protocol registration
if (process.defaultApp) {
  if (process.argv.length >= 2) {
    app.setAsDefaultProtocolClient('shopilot', process.execPath, [path.resolve(process.argv[1])]);
  }
} else {
  app.setAsDefaultProtocolClient('shopilot');
}

// Supported deep link routes:
// shopilot://auth/callback?code=&state=    → OAuth2 callback (Amazon/MeLi)
// shopilot://asin/{asin}                   → Focus chat on specific ASIN
// shopilot://alert/{alertId}              → Open specific fraud/price alert
// shopilot://billing/upgrade              → Jump to billing settings

function handleDeepLink(url: string) {
  const parsed = new URL(url);
  switch (parsed.pathname) {
    case '/auth/callback':
      mainWindow.webContents.send('auth:callback', {
        code: parsed.searchParams.get('code'),
        state: parsed.searchParams.get('state'),
      });
      break;
    case `/asin/${parsed.pathname.split('/')[2]}`:
      mainWindow.webContents.send('navigate:asin', parsed.pathname.split('/')[2]);
      break;
  }
}
Right-Click Context Menus
// main/contextMenu.ts
import { Menu, MenuItem, ipcMain } from 'electron';

ipcMain.on('show-context-menu', (event, context) => {
  const menu = new Menu();

  if (context.type === 'asin') {
    menu.append(new MenuItem({
      label: `Analyze ${context.asin}`,
      click: () => event.sender.send('coach:analyze', context.asin),
    }));
    menu.append(new MenuItem({
      label: 'View on Amazon',
      click: () => shell.openExternal(`https://amazon.com/dp/${context.asin}`),
    }));
    menu.append(new MenuItem({ type: 'separator' }));
    menu.append(new MenuItem({
      label: 'Copy ASIN',
      click: () => clipboard.writeText(context.asin),
    }));
  }

  if (context.type === 'price') {
    menu.append(new MenuItem({ label: 'Copy price', click: () => clipboard.writeText(context.value) }));
    menu.append(new MenuItem({ label: 'Ask coach about this price', click: () => event.sender.send('coach:ask', `Why is this price ${context.value}?`) }));
  }

  menu.popup({ window: BrowserWindow.fromWebContents(event.sender)! });
});

Tray Menu + Badge Counts

// Update tray badge when alerts arrive
function updateTrayBadge(count: number) {
  if (process.platform === 'darwin') {
    app.dock.setBadge(count > 0 ? String(count) : '');
  }
  tray.setToolTip(`Shopilot — ${count > 0 ? `${count} alerts` : 'All clear'}`);
}

// Tray context menu
const trayMenu = Menu.buildFromTemplate([
  { label: 'Open Shopilot', click: () => mainWindow.show() },
  { label: 'Pause Coach', type: 'checkbox', checked: false,
    click: (item) => mainWindow.webContents.send('coach:pause', item.checked) },
  { type: 'separator' },
  { label: 'Check for Updates', click: () => autoUpdater.checkForUpdatesAndNotify() },
  { label: 'Quit', click: () => app.quit() },
]);

PARTIAL 09 · E2E Testing Framework 09 · Framework de Pruebas E2E

INCOMPLETE SPEC

Unit tests and component tests are implied but no E2E testing framework is explicitly specified. For an Electron app making real API calls and marketplace mutations, E2E tests are non-negotiable before beta. Las pruebas unitarias están implícitas pero no se especifica framework de E2E. Para una app Electron que hace mutaciones reales en marketplaces, las pruebas E2E son no-negociables antes del beta.

Testing Pyramid for Shopilot

E2E Tests (Playwright + electron-playwright) 10% of tests · Happy paths + critical mutations
Integration Tests (Vitest + MSW) 30% of tests · API mocking, IPC handlers
Unit Tests (Vitest) 60% of tests · Tools, reducers, utils, formatters
Critical E2E Test Cases (must pass before beta)
Test Case Why critical Mode
App launches, shows demo mode, chat accepts inputSmoke test — must always passDemo
Connect Amazon account via OAuth → tokens stored in safeStorageAuth is the first real actionSandbox
Send message → tool executes → confirmation appears → user approves → audit log writtenCore happy pathMock API
Approve irreversible action → confirm with typed text → action recorded → rollback availableTrust-critical flowMock API
Credits hit 0 → coach blocks → credit exhausted modal shows → upgrade flow opensRevenue-critical guardMock API
App restart → chat history loads from IndexedDB → last session visiblePersistence correctnessMock API
Update available → modal shows → user clicks Restart → app re-opens at same stateUpdate UX must not lose workMocked updater
Playwright + Electron Setup (click to expand)
// e2e/setup.ts
import { _electron as electron } from 'playwright';
import { test, expect } from '@playwright/test';

let electronApp: ElectronApplication;

test.beforeAll(async () => {
  electronApp = await electron.launch({
    args: ['dist/main/index.js'],
    env: {
      ...process.env,
      NODE_ENV: 'test',
      SHOPILOT_DEMO_MODE: 'true', // use fixture data
    },
  });
});

test.afterAll(async () => {
  await electronApp.close();
});

// Example test: coach chat flow
test('coach responds to ASIN query', async () => {
  const window = await electronApp.firstWindow();
  await window.fill('[data-testid="chat-input"]', 'What is happening with B08N5WRWNW?');
  await window.press('[data-testid="chat-input"]', 'Enter');
  await expect(window.locator('[data-testid="coach-response"]')).toBeVisible({ timeout: 10000 });
  await expect(window.locator('[data-testid="tool-accordion"]')).toBeVisible();
});

10 · Production Readiness Checklist 10 · Checklist de Listo para Producción

GATE CRITERIA

These gates must pass before each release milestone. No gate can be manually overridden without written sign-off from CEO + CTO. Estos gates deben pasar antes de cada milestone de release. Ningún gate puede omitirse sin aprobación escrita del CEO + CTO.

GATE 1 — Private Beta (before any external user)

Requirement Owner
All 7 E2E test cases pass on macOS 14 + macOS 15Sergio
Code signing + notarization working (Apple Developer cert)Mateo
Sentry DSN configured, PII filter verified, test event sentAndrés
Chat persistence: sessions survive app restartSergio
Single-instance lock prevents duplicate windowMateo
Demo mode works without any marketplace credentialsSergio
Update notification modal tested with mock version bumpMateo
Privacy policy published at shopilot.ai/privacyPablo

GATE 2 — Public Beta (before paid users)

Requirement Owner
GDPR data export (ZIP) working for all usersAndrés
Account deletion flow tested end-to-endAndrés
In-app support (Crisp) embedded and testedSergio
Multi-account: 2+ accounts with correct context isolationSergio
Deep link protocol (shopilot://) working for OAuth callbackMateo
Tray menu + badge count for unread alertsSergio
Right-click context menus on ASIN rows and pricesSergio
Terms of Service published + accepted on first launchPablo
Stripe webhooks tested for subscription lifecycleAndrés

GATE 3 — v1.0 GA

Requirement Owner
Figma Atomic Design library complete (atoms + molecules + organisms)External Design Team
Figma MCP integration working (Claude reads components directly)Mateo
WCAG AA audit passing (axe-playwright on all screens)Sergio
Performance: cold start < 3s on 2019 MBP (8GB RAM)Mateo
Windows 11 build passing (secondary target)Mateo
SOC 2 Type I audit initiated (required for enterprise)Pablo

The 80/20 Rule for Production Readiness La Regla 80/20 para Estar Listo para Producción

80% of production incidents come from 20% of neglected areas: auth edge cases, update failures, data loss on crash, and silent API errors. This section addresses all four. Ship Gate 1 within the first 3 weeks of dev, Gate 2 before any paid user, and Gate 3 before any press mention. El 80% de los incidentes en producción vienen del 20% de áreas descuidadas: edge cases de auth, fallos de actualización, pérdida de datos en crash, y errores silenciosos de API.

15. Brand Intelligence Lab — 17 Brand Books + Shopilot Recommendation Brand Intelligence Lab — 17 Brand Books + Recomendación Shopilot

Deep-dive brand books for the 6 reference products + 10 YC-backed startups with similar contexts. Colors, typography, buttons, spacing, motion, voice — everything. Ends with the Shopilot Recommended Brand Book. Brand books a profundidad de los 6 productos de referencia + 10 startups respaldadas por YC con contextos similares. Colores, tipografía, botones, espaciado, motion, voz — todo. Termina con el Brand Book Recomendado de Shopilot.

v1.0 · 2026-03
AN

Anthropic / Claude.ai

AI Safety Company · San Francisco · 2021 · YC Alumni (W21)Empresa de AI Safety · San Francisco · 2021 · Alumni YC (W21)

AI Product

Brand PhilosophyFilosofía de Marca

"AI for human flourishing"

The Anthropic visual language is built around the concept of "clay" — unfired earth, warm, unfinished, human. The brand consciously rejects the cold blue-shifted AI aesthetic (think IBM, Microsoft Azure, early OpenAI). Instead: warmth, earth, copper, organic. The name "Claude" deliberately chosen for its French warmth and humanist connotations. Every color decision reflects: trustworthy AI that feels human, not robotic.El lenguaje visual de Anthropic se construye alrededor del concepto de "arcilla" — tierra sin cocer, cálida, inacabada, humana. La marca rechaza conscientemente la estética AI fría con tono azulado (como IBM, Microsoft Azure, OpenAI inicial). En cambio: calidez, tierra, cobre, orgánico. El nombre "Claude" elegido deliberadamente por su calidez francesa y connotaciones humanistas. Cada decisión de color refleja: AI confiable que se siente humana, no robótica.

Color SystemSistema de Color

#faf9f5

Background Light

RGB 250/249/245 · toasted cream

#141413

Background Dark

RGB 20/20/19 · warm undertone

#CC785C

Brand Copper

Logo · selection · icon

#d97757

UI Orange

CTAs · interactive elements

#6a9bcc

Muted Blue

Secondary · info states

#788c5d

Muted Green

Success · positive states

rgba(204,120,92,.15)

Selection BG

Text selection highlight

#1a1915

Surface Dark

Cards on dark bg

Contrast: #141413 on #faf9f5 = 19.9:1 AAA · #CC785C on #faf9f5 = 5.0:1 AA · #d97757 on #141413 = 6.1:1 AAContraste: #141413 sobre #faf9f5 = 19.9:1 AAA · #CC785C sobre #faf9f5 = 5.0:1 AA · #d97757 sobre #141413 = 6.1:1 AA

Rule: Never use pure black (#000) or pure white (#fff). The warmth delta of ~5 RGB units in each neutral makes everything feel premium vs. commodity.Regla: Nunca usar negro puro (#000) ni blanco puro (#fff). El delta de calidez de ~5 unidades RGB en cada neutro hace que todo se sienta premium vs. commodity.

Typography SystemSistema Tipográfico

RoleFontWeightUsage
Display / HeadlinesStyrene A / Styrene B400–700Hero titles, section headsTítulos hero, encabezados
Editorial / Long-formTiempos Text400 italicBlog, docs, long readsBlog, docs, lectura larga
Product / UI TextStyrene A400–500App UI, labels, bodyUI de app, etiquetas, cuerpo
Code / DataJetBrains Mono400Code blocks, inline codeBloques de código, código inline
Accent / QuoteGalaxie Copernicus300 italicPull quotes, feature textPull quotes, texto destacado

Type scale:Escala tipográfica: display-xxl: clamp(3rem, 5vw, 5rem) · display-lg: clamp(2rem, 3.5vw, 3.5rem) · display-xs: clamp(1.125rem, 1.5vw, 1.25rem) · body: 1rem/1.6

Button SystemSistema de Botones

● Primary: bg #d97757 · text white · radius 8px · padding 10px 20px · font-weight 600

● Secondary: border 1.5px #CC785C/50 · text #CC785C · bg transparent · same radii/padding

Hover: filter: brightness(1.1) — never use a fixed darker hex, keep theming dynamicHover: filter: brightness(1.1) — nunca usar un hex oscuro fijo, mantener el theming dinámico

Spacing · Shadows · Motion · Voice (deep spec) Espaciado · Sombras · Motion · Voz (spec profundo)

SpacingEspaciado

Site margin: clamp(2rem, 5rem)

Nav height: 68px (4.25rem)

Section gap: 96px–160px

Chat max-w: 768px (3xl)

Message max: 75ch

ShadowsSombras

Default: none

Flyout: 0 8px 32px rgba(0,0,0,.12)

Modal: 0 24px 64px rgba(0,0,0,.18)

Focus ring: 0 0 0 3px rgba(204,120,92,.3)

Motion

Menu open: 400ms

Dropdown: 200ms

Tooltip: 150ms

Easing: cubic-bezier(.4,0,.2,1)

Streaming: 0ms delay, instant

Brand VoiceVoz de Marca

Tone adjectivesAdjetivos de tono

Thoughtful · Warm · Honest · Direct · Curious · Humble

Anti-toneAnti-tono

Never: Hype-y · Corporate · Cold · Overpromising · Robotic

Writing styleEstilo de escritura

Conversational but precise. Short sentences. Active voice. Explains "why" not just "what".Conversacional pero preciso. Frases cortas. Voz activa. Explica el "por qué" no solo el "qué".

Shopilot inheritsShopilot hereda

Candidate inspiration: warm copper accent · dark backgrounds · trustworthy AI voiceInspiración candidata: acento cobre cálido · fondos oscuros · voz AI confiable

CU

Cursor IDE

AI-Native Code Editor · Anysphere · 2022 · YC S22Editor de Código AI-Native · Anysphere · 2022 · YC S22

AI-Native IDE

Brand PhilosophyFilosofía de Marca

"The AI-first code editor built for pair programming with AI"

Cursor's brand philosophy is hyper-functional. There is no decorative layer — every visual decision serves the task of writing code. The orange accent (#f54e00) is used only for the critical hot path: the most important action on screen. The warm off-white/off-black background signals "professional tool" vs. "consumer app." The UI is intentionally dense — developers are trained to read dense information quickly.La filosofía de marca de Cursor es híper-funcional. No hay capa decorativa — cada decisión visual sirve a la tarea de escribir código. El acento naranja (#f54e00) se usa solo para el hot path crítico: la acción más importante en pantalla. El fondo off-white/off-black cálido señala "herramienta profesional" vs. "app consumer". La UI es intencionalmente densa — los developers están entrenados para leer información densa rápidamente.

Color SystemSistema de Color

#f7f7f4

--color-theme-bg

Warm off-white

#26251e

--color-theme-fg

Warm off-black

#f54e00

--color-theme-accent

Hot orange · CTAs only

--fg-01 … --fg-100

Opacity Scale

Every 5% step from bg color

Base units: --g: calc(10rem/16) ≈ 10px (grid) · --v: 1.375rem ≈ 22px (vertical rhythm)

Duration: --duration: .14s · --duration-slow: .25s

Easing: --ease-out-spring: cubic-bezier(.25,1,.5,1)

Shadows: Ultra-minimal 0 0 1rem #00000005 — shadows only on flyouts, never on cards

Border radii: 2 · 4 · 8 · 12 · 16px — smallest for inputs, largest for panels

TypographyTipografía

RoleFontSizeNotes
UI Product (sm)System + custom11px (.6875rem)--text-product-sm · labels, status--text-product-sm · etiquetas, estado
UI Product (base)System + custom12px (.75rem)--text-product-base · default text--text-product-base · texto por defecto
UI Product (lg)System + custom13px (.8125rem)--text-product-lg · section titles--text-product-lg · títulos de sección
Code / DataJetBrains Mono12–13pxCode, terminal output, numbersCódigo, salida terminal, números

Note: Cursor uses data-os=linux to switch to system font stack. Respects user's OS font preference — a developer-first accessibility decision.Nota: Cursor usa data-os=linux para cambiar al stack de fuente del sistema. Respeta la preferencia de fuente del OS del usuario — una decisión de accesibilidad developer-first.

Button SystemSistema de Botones

● Primary: bg #f54e00 · radius 6px · padding 8px 16px · font-weight 600 · no border

● Secondary: bg rgba(fff,.07) · border rgba(fff,.12) · radius 4px · font-weight 400

Accent text buttons: color #f54e00 · bg transparent · hover underline onlyBotones de texto acento: color #f54e00 · bg transparent · hover solo subrayado

Rule: orange CTA used ONCE per screen. Second most important action is always ghost.Regla: CTA naranja usado UNA VEZ por pantalla. La segunda acción más importante siempre es ghost.

What Shopilot InheritsQué Hereda Shopilot

Split pane 70/30 · WebContentsView architecture · --g/--v base units · opacity token scale · status bar 24px · ultra-minimal shadows · one orange CTA ruleSplit pane 70/30 · Arquitectura WebContentsView · Unidades base --g/--v · Escala de tokens de opacidad · Status bar 24px · Sombras ultra-mínimas · Regla de un CTA naranja

HS

HubSpot / Canvas Design System

CRM & Marketing Platform · Cambridge MA · 2006 · Public ($HUBS)Plataforma CRM y Marketing · Cambridge MA · 2006 · Pública ($HUBS)

Enterprise SaaS

Brand PhilosophyFilosofía de Marca

"Sprocket-right: interfaces must work for the user, not impress other designers"

HubSpot's Canvas system represents 20 years of B2B SaaS learning. Their core insight: beautiful design at enterprise scale means designing for efficiency and clarity, not aesthetics. Every component is tested against "does this help the user complete their task faster?" The orange brand color (#ff7a00) was chosen for energy, approachability, and differentiation from blue-dominant CRM competitors (Salesforce). Canvas explicitly codifies the philosophy that function precedes form.El sistema Canvas de HubSpot representa 20 años de aprendizaje en SaaS B2B. Su insight principal: diseño hermoso a escala enterprise significa diseñar para eficiencia y claridad, no estética. Cada componente se prueba contra "¿esto ayuda al usuario a completar su tarea más rápido?". El color naranja de marca (#ff7a00) fue elegido por energía, cercanía y diferenciación de los competidores CRM dominados por azul (Salesforce). Canvas codifica explícitamente la filosofía de que la función precede a la forma.

Color SystemSistema de Color

#ffffff

Base White

Primary background

#2D3E50

Midnight Blue

Primary text · headers

#ff7a00

Calypso Orange

Brand · CTAs

#00BDA5

Teal

Success · secondary CTA

#F5C26B

Flax

Warning · alerts

#EAF0F6

Mist Gray

Panel backgrounds

#516F90

Regent Gray

Secondary text

#F2545B

Alizarin

Error · destructive

Typography + ButtonsTipografía + Botones

FontsFuentes

Display: HubSpot Serif (custom, Typekit)

UI: HubSpot Sans (custom, Typekit)

Code: Lucida Console / Courier New (fallback)

Scale: 12 · 14 · 16 · 20 · 24 · 32 · 40 · 48px

Radius: --cl-radius ~6px standard

Icons: SVG fill:currentColor · 2rem default · .cl-icon class

ButtonsBotones

What Shopilot InheritsQué Hereda Shopilot

Merchant-first philosophy · Data table density · Function over aesthetics principle · Multiple semantic colors for different alert types · Sprocket-right thinkingFilosofía merchant-first · Densidad de tablas de datos · Principio función sobre estética · Múltiples colores semánticos para tipos de alerta · Pensamiento Sprocket-right

LI

Linear

Project Management Tool · San Francisco · 2019 · YC W20Herramienta de Gestión de Proyectos · San Francisco · 2019 · YC W20

B2B Productivity

Brand PhilosophyFilosofía de Marca

"Speed is a feature — every interaction must feel instantaneous"

Linear's brand is built on the premise that design debt in productivity tools costs people hours every week. Their aesthetic is extreme minimalism — not because it looks good, but because every unnecessary element steals attention. The indigo brand color (#5e6ad2) was chosen for calm authority: it communicates "serious tool for serious work" without being cold or aggressive. Background Woodsmoke (#1a1a1e) is the darkest of the reference brands — near-black, but slightly purple-shifted for warmth.La marca de Linear se construye sobre la premisa de que la deuda de diseño en herramientas de productividad le cuesta a la gente horas cada semana. Su estética es minimalismo extremo — no porque se vea bien, sino porque cada elemento innecesario roba atención. El color índigo de marca (#5e6ad2) fue elegido por autoridad tranquila: comunica "herramienta seria para trabajo serio" sin ser frío ni agresivo. El fondo Woodsmoke (#1a1a1e) es el más oscuro de las marcas de referencia — casi negro, pero ligeramente desplazado hacia el púrpura para dar calidez.

Color SystemSistema de Color

#1a1a1e

Woodsmoke

Primary bg · dark

#111116

Sidebar BG

Navigation panel

#5e6ad2

Indigo Brand

Logo · selected · CTAs

#8b8fa8

Oslo Gray

Secondary text

#25252a

Surface

Card backgrounds

#2e3035

Hover Surface

Row hover state

#4cb782

Done Green

Completed state

#eb5757

Cancelled Red

Error · blocked state

Design rules:Reglas de diseño: No gradients ever · No decorative shadows · Use opacity over new colors · Border: 1px rgba(255,255,255,.06) onlySin gradients nunca · Sin sombras decorativas · Usar opacidad en lugar de nuevos colores · Borde: solo 1px rgba(255,255,255,.06)

Keyboard-first: every action reachable without mouse. Speed is communicated through interaction, not animation.cada acción alcanzable sin mouse. La velocidad se comunica a través de la interacción, no de la animación.

Typography + ButtonsTipografía + Botones

Display: Inter Display · weights 300 (light) + 700 (bold)

UI: Inter · weights 400/500

Code: JetBrains Mono · 12–13px

Scale: 11 · 12 · 13 · 14 · 16 · 20 · 28 · 40px

Line height: 1.4 UI · 1.6 body

What Shopilot InheritsQué Hereda Shopilot

No gradients / no decorative shadows · Opacity token approach · Keyboard-first mindset · Dark bg with slight warm purple shift · Extreme information density without visual noiseSin gradients / sin sombras decorativas · Enfoque de tokens de opacidad · Mentalidad keyboard-first · Fondo oscuro con ligero tono púrpura cálido · Densidad de información extrema sin ruido visual

VC

Vercel / Geist Design System

Frontend Cloud Platform · San Francisco · 2015 · YC W16Plataforma Cloud Frontend · San Francisco · 2015 · YC W16

Dev Tools

Brand PhilosophyFilosofía de Marca

"Black canvas: dark mode is not a theme, it's the identity"

Vercel's brand is the most radical of the six. Pure black (#000000) as the primary background — not dark navy, not warm dark, pure black. This is intentional: developers live in dark mode, and Vercel wants to be the platform that feels like the best developer tool they've ever used. Maximum contrast, maximum focus. The Geist typeface (custom, now open source) was designed specifically for developer interfaces: geometric sans for UI, geometric mono for code. No accent color — pure black/white/gray hierarchy.La marca de Vercel es la más radical de las seis. Negro puro (#000000) como fondo primario — no navy oscuro, no oscuro cálido, negro puro. Esto es intencional: los developers viven en dark mode, y Vercel quiere ser la plataforma que se siente como la mejor herramienta de developer que han usado. Contraste máximo, foco máximo. La tipografía Geist (custom, ahora open source) fue diseñada específicamente para interfaces de developer: geométrica sans para UI, geométrica mono para código. Sin color de acento — jerarquía pura negro/blanco/gris.

Color System — Pure Grayscale + FunctionalSistema de Color — Escala de Grises Pura + Funcional

#000

#111

#333

#444

#666

#888

#eaeaea

#fafafa

#0070F3

Blue · Links · Info

#50E3C2

Cyan · Success

#FF0080

Pink · Error/Warning

TypographyTipografía

Display/UI: Geist Sans (open source, Google Fonts)

Code/Data: Geist Mono (open source, Google Fonts)

Scale: 12 · 14 · 16 · 20 · 24 · 32 · 48 · 64px

Weight: 400 body · 500 medium · 600 semibold · 700 bold

Radius: 6px standard · 8px cards · 12px modal

ButtonsBotones

What Shopilot InheritsQué Hereda Shopilot

Dark-first approach · Pure functional color (no decoration) · High contrast focus ring · Developer-dense information hierarchy · Geist Mono (open source alternative to JetBrains Mono)Enfoque dark-first · Color puramente funcional (sin decoración) · Focus ring alto contraste · Jerarquía de información densa para developers · Geist Mono (alternativa open source a JetBrains Mono)

SH

Shopify / Polaris Design System

Commerce Platform · Ottawa · 2006 · Public ($SHOP)Plataforma de Comercio · Ottawa · 2006 · Pública ($SHOP)

Commerce SaaS

Brand PhilosophyFilosofía de Marca

"Merchant-first: every decision evaluated from the merchant's perspective"

Polaris is the most mature design system in this study — 7+ years of iteration, thousands of components, and a philosophy that has been consistently proven: clarity beats elegance. Shopify's merchant is not a designer or developer — they're a small business owner who needs to act fast and make money. The design system's entire vocabulary is optimized for task completion speed, not visual delight. The green brand color grew from the Shopify logo and represents growth, money, and success.Polaris es el sistema de diseño más maduro de este estudio — 7+ años de iteración, miles de componentes, y una filosofía consistentemente probada: la claridad supera a la elegancia. El comerciante de Shopify no es diseñador ni developer — es un dueño de pequeño negocio que necesita actuar rápido y ganar dinero. El vocabulario completo del sistema de diseño está optimizado para la velocidad de completar tareas, no para el deleite visual. El color verde de marca creció del logo de Shopify y representa crecimiento, dinero y éxito.

Color SystemSistema de Color

#FAFAFA

Background

Light mode primary

#202223

Ink

Primary text

#008060

Interactive Green

CTAs · brand

#95BF47

Logo Green

Brand logo only

#5C5F62

Subdued

Secondary text

#D82C0D

Critical

Error · destructive

#FFC453

Warning

Alert states

#AEE9D1

Success Light

Success bg tint

TypographyTipografía

All: Inter (UI) · system-ui fallback

Scale: 12 · 14 · 16 · 20 · 26 · 32px

Radius: 4px inputs · 8px cards · 12px modals

Data Viz Rules:Reglas de Data Viz:

Totals bold + row 1 · Focus: 1 insight/chartTotales en negrita + fila 1 · Foco: 1 insight/chart

Multiple data formats (table + chart always)Múltiples formatos de datos (tabla + chart siempre)

ButtonsBotones

What Shopilot InheritsQué Hereda Shopilot

Seller-first decision framework · Data viz rules (totals first, 1 insight) · Semantic color discipline · Clarity > elegance principle · A11y requirements for data tablesFramework de decisiones seller-first · Reglas data viz (totales primero, 1 insight) · Disciplina de color semántico · Principio claridad > elegancia · Requisitos a11y para tablas de datos

YC Startups · 10 Similar Brands10 Marcas Similares
BX

Brex

Corporate Fintech · San Francisco · 2017 · YC W17 · $12.3B valuationFintech Corporativo · San Francisco · 2017 · YC W17 · Valoración $12.3B

Fintech · Handles Real Money

Brand PhilosophyFilosofía de Marca

"Make money management effortless for ambitious companies"

Brex is the closest contextual analogue to Shopilot in terms of trust architecture. Both handle real money on behalf of businesses, both require the UI to communicate precision and authority. Brex's design evolved from a startup-y orange era to a mature, premium dark theme. Current palette: near-black backgrounds (#0E0E0E), warm coral/salmon accent for CTA emphasis, Söhne as the premium custom typeface. The warm coral (not pure orange) signals "approachable financial authority" — slightly warmer than corporate, slightly cooler than consumer fintech.Brex es el análogo contextual más cercano a Shopilot en términos de arquitectura de confianza. Ambos manejan dinero real en nombre de negocios, ambos requieren que la UI comunique precisión y autoridad. El diseño de Brex evolucionó de una era naranja de startup a un tema oscuro premium maduro. Paleta actual: fondos casi-negros (#0E0E0E), acento coral/salmón cálido para énfasis CTA, Söhne como la tipografía premium custom. El coral cálido (no naranja puro) señala "autoridad financiera accesible" — ligeramente más cálido que el corporativo, ligeramente más frío que el fintech consumer.

Color SystemSistema de Color

#0E0E0E

Background Dark

Near-black · product UI

#FFFDF9

Background Light

Warm off-white

#F27B6B

Coral Accent

CTAs · brand emphasis

#FF5200

Hot Orange

High-urgency CTAs

#1A1A1A

Surface

Cards, panels

#2D2D2D

Border/Stroke

Dividers, outlines

#00C278

Success Green

Positive states

#FF4444

Error Red

Errors · blocks

TypographyTipografía

Display: Söhne (Klim Type Foundry) · €€€

UI: Söhne · weights 300/400/600

Data: Söhne Mono (tabular figures)

Scale: 11 · 13 · 15 · 18 · 24 · 36 · 48px

Key: Tabular figures for all financial data (tnum feature)Figuras tabulares para todos los datos financieros (feature tnum)

ButtonsBotones

Key Insights for ShopilotInsights Clave para Shopilot

Trust architecture: Trust-critical data (balances, transactions) gets highest contrast (white-on-black). Secondary info gets progressively less contrast.Arquitectura de confianza: Los datos críticos de confianza (balances, transacciones) obtienen el mayor contraste (blanco sobre negro). La info secundaria obtiene progresivamente menos contraste.

Tabular nums: All financial data uses font-variant-numeric: tabular-nums so numbers align vertically in tables.Nums tabulares: Todos los datos financieros usan font-variant-numeric: tabular-nums para que los números se alineen verticalmente en tablas.

Could inspire Shopilot: Near-black background · Coral warm accent · Tabular nums for prices · Söhne inspiration (use Inter + JetBrains Mono as accessible equivalent)Podría inspirar a Shopilot: Fondo casi-negro · Acento coral cálido · Nums tabulares para precios · Inspiración Söhne (usar Inter + JetBrains Mono como equivalente accesible)

MC

Mercury

Neobank for Startups · San Francisco · 2019 · YC S19 · $1.62B valuationNeobank para Startups · San Francisco · 2019 · YC S19 · Valoración $1.62B

Banking · Handles Real Money

Brand PhilosophyFilosofía de Marca

"Banking that gets out of your way"

Mercury achieved something extremely rare: making banking software look desirable. Their dark-mode-first interface (a radical choice for financial software in 2019) communicated that they understood their customer — tech founders who live in dark terminals. The Mercury Sans custom typeface has a slight humanist influence that prevents the bank UI from feeling cold and bureaucratic. The teal/blue accent is intentionally understated — mercury (the element) is subtle, precise, reflects its environment.Mercury logró algo extremadamente raro: hacer que el software bancario se viera deseable. Su interfaz dark-mode-first (una elección radical para software financiero en 2019) comunicó que entendían a su cliente — fundadores tech que viven en terminales oscuros. La tipografía custom Mercury Sans tiene una ligera influencia humanista que evita que la UI bancaria se sienta fría y burocrática. El acento teal/azul es intencionalmente contenido — el mercurio (el elemento) es sutil, preciso, refleja su entorno.

Color SystemSistema de Color

#0A0A0A

Background

Near-pure black

#FAFAF9

Light BG

Warm off-white

#4AA8FF

Mercury Blue

CTAs · links · selected

#00BFA5

Teal

Balance · positive

#141414

Surface

Cards · panels

#1E1E1E

Hover Surface

Row hover

#FF5F5F

Alert Red

Errors · negative bal.

#F5A623

Warning Amber

Low balance · pending

TypographyTipografía

Display/UI: Mercury Sans (custom, humanist geometric)

Numbers: Tabular lining figures (font-variant-numeric)

Code: Fira Code / iA Writer Mono (code blocks)

Weight: 300 light · 400 regular · 500 medium · 600 semibold

Spacing: letter-spacing: -0.01em for display text

Buttons + UI PatternsBotones + Patrones UI

Radius: 12px (rounded, approachable) · Borders: ultra-subtle rgba · Balance displayed in large mono at top of every pageRadio: 12px (redondeado, accesible) · Bordes: ultra-sutiles rgba · Balance mostrado en mono grande al inicio de cada página

Key Insights for ShopilotInsights Clave para Shopilot

Dark-first banking sets the precedent that serious financial tools CAN be dark mode · Balance/KPI always displayed in large mono (same as Shopilot GMV) · 12px radius makes data dense while remaining approachable · Warm off-white light mode for reports/print contextsEl banking dark-first sienta el precedente de que las herramientas financieras serias PUEDEN ser dark mode · Balance/KPI siempre mostrado en mono grande (igual que GMV de Shopilot) · Radio 12px hace los datos densos mientras permanecen accesibles · Off-white cálido modo claro para reportes/contextos de impresión

RT

Retool

Internal Tools Builder · San Francisco · 2017 · YC S17 · $3.2B valuationConstructor de Herramientas Internas · San Francisco · 2017 · YC S17 · Valoración $3.2B

Data-Dense B2B

Brand PhilosophyFilosofía de Marca

"Build internal tools, 10x faster"

Retool is the master of data-dense UI. Their product is literally a table+form builder — every design decision serves the goal of making dense grids of data scannable and actionable. Their canvas-style editor is perhaps the most data-rich interface in SaaS. Blue accent (#3B5EE7) was chosen for authority and trust — similar to financial platforms but more "engineering-y" than coral/orange. The dark background (#202124) is slightly warm-gray, similar to VS Code, which their developer audience knows instinctively.Retool es el maestro de la UI densa en datos. Su producto es literalmente un constructor de tabla+formulario — cada decisión de diseño sirve al objetivo de hacer que las cuadrículas de datos densas sean escaneables y accionables. Su editor canvas es quizás la interfaz más rica en datos del SaaS. El acento azul (#3B5EE7) fue elegido por autoridad y confianza — similar a las plataformas financieras pero más "ingenieril" que coral/naranja. El fondo oscuro (#202124) es ligeramente gris cálido, similar a VS Code, que su audiencia de developers conoce instintivamente.

Color SystemSistema de Color

#202124

BG Dark

Warm gray (VS Code-ish)

#F8F9FA

BG Light

Default canvas

#3B5EE7

Blue Brand

Selected · CTAs

#5C7CFA

Blue Light

Hover · focus

#2C2D30

Surface

Panel bg

#37383B

Border

Dividers

#2ECC71

Success

OK states

#E74C3C

Error

Error states

Data Table Design (Core Pattern)Diseño de Data Table (Patrón Core)

Row height: 32px compact · 40px default · 48px comfortable (user-configurable)Altura de fila: 32px compacto · 40px default · 48px cómodo (configurable por usuario)

Header: sticky · sortable · resizable columns · filter per columnHeader: sticky · ordenable · columnas redimensionables · filtro por columna

Numbers: Right-aligned in all numeric columns · font-variant-numeric: tabular-numsNúmeros: Alineados a la derecha en todas las columnas numéricas · font-variant-numeric: tabular-nums

Could inspire Shopilot: Compact table density · Column sorting + filtering · Right-aligned numbers · VS Code-familiar warm gray bgPodría inspirar a Shopilot: Densidad de tabla compacta · Ordenación + filtrado de columnas · Números alineados a la derecha · Fondo gris cálido familiar de VS Code

SB

Supabase

Open Source Firebase Alternative · Singapore · 2020 · YC S20 · $200M+ raisedAlternativa Firebase Open Source · Singapur · 2020 · YC S20 · +$200M recaudados

Dev Tools · Open Source

Brand PhilosophyFilosofía de Marca

"Build in a weekend, scale to millions"

Supabase's brand is perhaps the most distinctive in this study: an aggressive, developer-native green (#3ECF8E) on pure dark backgrounds. The green was chosen for its association with databases (terminal text), open source culture (GitHub green), and PostgreSQL. Their brand radiates developer confidence — "we're not trying to be enterprise, we're trying to be the best developer experience." The contrast between near-black backgrounds and the bright emerald is high (7.2:1), making every UI element immediately visible.La marca de Supabase es quizás la más distintiva de este estudio: un verde agresivo y developer-native (#3ECF8E) sobre fondos oscuros puros. El verde fue elegido por su asociación con bases de datos (texto terminal), cultura open source (GitHub verde) y PostgreSQL. Su marca irradia confianza de developer — "no estamos tratando de ser enterprise, estamos tratando de ser la mejor experiencia de developer". El contraste entre fondos casi-negros y el esmeralda brillante es alto (7.2:1), haciendo que cada elemento UI sea inmediatamente visible.

Color SystemSistema de Color

#1C1C1C

BG Dark

Primary background

#111111

BG Deeper

Sidebar / nav

#3ECF8E

Supabase Green

Brand · CTAs · selected

#00C973

Green Vivid

Running / active states

#262626

Surface

Cards

#3F3F3F

Border

Dividers

#F97316

Warning Amber

Attention states

#EF4444

Error Red

Errors · destructive

Typography + Key Insights for ShopilotTipografía + Insights Clave para Shopilot

Display/UI: Inter (all weights) · Code: Fira Code / UI Monospace

Radius: 6px uniform — very slightly rounded, feels professional not playfulRadio: 6px uniforme — muy ligeramente redondeado, se siente profesional no juguetón

Could inspire Shopilot: Proof that a single strong accent color CAN be green for marketplaces (Shopify marketplace tab) · Dark + bright single accent contrast pattern · Warning using orange (candidate reference for Shopilot)Podría inspirar a Shopilot: Prueba de que un solo color de acento fuerte PUEDE ser verde para marketplaces (tab marketplace Shopify) · Patrón de contraste oscuro + acento único brillante · Warning usando naranja (referencia candidata para Shopilot)

PH

PostHog

Open Source Product Analytics · London · 2020 · YC W20 · $225M raisedAnalytics de Producto Open Source · Londres · 2020 · YC W20 · $225M recaudados

Analytics · Open Source

Brand PhilosophyFilosofía de Marca

"The only product analytics platform where data stays yours"

PostHog is the most boldly-branded in this study. Hedgehog mascot, golden yellow (#F9BD2B) that actually glows, developer-irreverent tone. Their design deliberately breaks "enterprise SaaS" conventions to signal: we're built by developers, for developers, and we refuse to look boring. However, beneath the playfulness, the data visualization is meticulously precise. Their dark UI (#1D1D27 with purple-shifted dark) keeps analytics dashboards readable 8+ hours a day. The yellow is used sparingly for the most important elements.PostHog es la marca más audaz de este estudio. Mascota de erizo, amarillo dorado (#F9BD2B) que literalmente brilla, tono irreverente de developer. Su diseño rompe deliberadamente las convenciones de "enterprise SaaS" para señalar: somos construidos por developers, para developers, y nos negamos a vernos aburridos. Sin embargo, debajo del juego, la visualización de datos es meticulosamente precisa. Su UI oscura (#1D1D27 con oscuro desplazado hacia púrpura) mantiene los dashboards de analytics legibles 8+ horas al día. El amarillo se usa con moderación para los elementos más importantes.

Color SystemSistema de Color

#1D1D27

BG Dark

Purple-shifted dark

#FFFEF0

BG Light

Golden cream

#F9BD2B

PostHog Yellow

Brand · emphasis

#F54E00

Hot Orange

CTAs · high-priority

#2C2C3A

Surface

Cards · panels

#3C3C50

Border

Dividers

#2AC940

Success

Positive events

#F04438

Error

Error states

Key Insights for ShopilotInsights Clave para Shopilot

Purple-shifted dark backgrounds feel "deeper" than neutral dark — great for analytics views · Data precision underneath playful branding · Yellow used ONLY for the most important metric on screen (same principle that could apply to Shopilot's chosen accent (TBD)) · Chart color palette: 8 distinct hues, all at 60% saturation for harmonyLos fondos oscuros desplazados hacia púrpura se sienten "más profundos" que el oscuro neutro — excelente para vistas de analytics · Precisión de datos bajo una marca juguetona · Amarillo usado SOLO para la métrica más importante en pantalla (mismo principio que podría aplicar al acento elegido de Shopilot (por definir)) · Paleta de colores de charts: 8 tonos distintos, todos al 60% de saturación para armonía

RS

Resend

Developer Email Platform · San Francisco · 2022 · YC W23 · $26M raisedPlataforma de Email para Developers · San Francisco · 2022 · YC W23 · $26M recaudados

Dev Infrastructure

Brand PhilosophyFilosofía de Marca

"Email for developers, built by developers"

Resend's brand is pure monochromatic minimalism — perhaps the most extreme in this study. Pure black (#000000), pure grays, one orange accent for the logo and primary CTA only. The philosophy: email infrastructure should be completely invisible, the developer's code is the product. Their UI is so stripped down it looks like GitHub's settings page elevated to art. This design communicates: we're not trying to impress you with UI, we're trying to not get in your way. Strong influence from Vercel's aesthetic (same investor: Guillermo Rauch's orbit).La marca de Resend es minimalismo monocromático puro — quizás el más extremo de este estudio. Negro puro (#000000), grises puros, un acento naranja para el logo y el CTA primario únicamente. La filosofía: la infraestructura de email debe ser completamente invisible, el código del developer es el producto. Su UI está tan despojada que parece la página de configuración de GitHub elevada a arte. Este diseño comunica: no estamos tratando de impresionarte con UI, estamos tratando de no interponernos en tu camino. Fuerte influencia de la estética de Vercel (mismo inversor: órbita de Guillermo Rauch).

Color System — Pure MonochromaticSistema de Color — Monocromático Puro

#000

BG

#0a0a

Surface

#171717

Card

#262626

Border

#525252

Muted

#a3a3a3

Secondary

#ededed

Primary

#fff

Headings

#FF5700

Logo Orange · CTA only

TypographyTipografía

All: Geist Sans + Geist Mono (open source)

Scale: 13 · 14 · 16 · 20 · 28 · 40px

Tracking: letter-spacing: -0.02em headings

Radius: 8px standard (slightly rounded)

ButtonsBotones

Key Insights for ShopilotInsights Clave para Shopilot

Proof that monochromatic + one accent works at scale · #000 vs #171717 vs #262626 — subtle layering creates depth without color · Code + logs = always Geist Mono / JetBrains Mono → reinforces precisionPrueba de que monocromático + un acento funciona a escala · #000 vs #171717 vs #262626 — capas sutiles crean profundidad sin color · Código + logs = siempre Geist Mono / JetBrains Mono → refuerza precisión

CL

Clerk

Authentication Platform · San Francisco · 2021 · YC W22 · $170M raisedPlataforma de Autenticación · San Francisco · 2021 · YC W22 · $170M recaudados

Auth · Dev Tools

Brand PhilosophyFilosofía de Marca

"The most comprehensive User Management Platform"

Clerk's brand sits at the intersection of developer tools and security software. Purple (#6C47FF) was chosen to differentiate from both the "enterprise blue" space (Okta, Auth0) and the "startup orange" space. It communicates "modern, premium, slightly magical" — auth happens in the background, Clerk makes it elegant. Their dark UI (#131316 — warm-shifted very dark) uses glass-morphism for the prebuilt UI components, an unusual choice that works because authentication is a "gateway moment" that benefits from premium feel.La marca de Clerk se sitúa en la intersección entre herramientas de developer y software de seguridad. El púrpura (#6C47FF) fue elegido para diferenciarse tanto del espacio "azul enterprise" (Okta, Auth0) como del espacio "naranja startup". Comunica "moderno, premium, ligeramente mágico" — la autenticación ocurre en el fondo, Clerk la hace elegante. Su UI oscura (#131316 — muy oscura con tono cálido) usa glass-morphism para los componentes UI prefabricados, una elección inusual que funciona porque la autenticación es un "momento puerta de entrada" que se beneficia de la sensación premium.

Color SystemSistema de Color

#131316

BG Dark

Warm-shifted dark

#FAFAFA

BG Light

Dashboard light mode

#6C47FF

Clerk Purple

Brand · CTAs · focus

#9B7DFF

Purple Light

Hover · secondary

#1C1C21

Surface

Cards

#2C2C35

Border

Dividers

#12B76A

Success

Auth success

#F04438

Error

Auth failure

Key Insights for ShopilotInsights Clave para Shopilot

Glass-morphism for "gateway moments" (login, confirmation dialogs) · Purple differentiation shows you don't need orange to be distinctive · #131316 warm-dark-shifted background similar to Shopilot's own bg · Onboarding modal design: clean step indicators, focus on one action per stepGlass-morphism para "momentos puerta de entrada" (login, diálogos de confirmación) · Diferenciación púrpura muestra que no necesitas naranja para ser distintivo · Fondo oscuro cálido #131316 similar al fondo propio de Shopilot · Diseño de modal de onboarding: indicadores de paso limpios, foco en una acción por paso

DL

Deel

Global HR & Payroll · San Francisco · 2019 · YC W19 · $12B valuationRRHH y Nómina Global · San Francisco · 2019 · YC W19 · Valoración $12B

Global Payroll · Handles Real Money

Brand PhilosophyFilosofía de Marca

"Hire anyone, anywhere — with compliance built in"

Deel handles international payroll for 35,000+ companies — arguably the most complex, trust-critical SaaS product in this study. Their design reflects that weight: corporate navy blue (#1D2130) backgrounds, conservative button styles, clear error states for compliance failures. Nothing flashy — a company trusting you with their global payroll needs you to look like you know what you're doing. The blue palette (#2B6EE4) is authoritative without being aggressive, similar to how a bank presents itself.Deel maneja la nómina internacional de 35,000+ empresas — posiblemente el producto SaaS más complejo y crítico de confianza de este estudio. Su diseño refleja ese peso: fondos azul marino corporativo (#1D2130), estilos de botón conservadores, estados de error claros para fallas de cumplimiento. Nada llamativo — una empresa que te confía su nómina global necesita que parezcas saber lo que estás haciendo. La paleta azul (#2B6EE4) es autoritaria sin ser agresiva, similar a cómo un banco se presenta.

Color SystemSistema de Color

#1D2130

BG Dark Navy

Primary dark surface

#F4F6FA

BG Light

Blue-tinted white

#2B6EE4

Deel Blue

CTAs · brand

#4D8FF0

Blue Light

Hover · secondary

#252A3C

Surface

Cards

#2F3547

Border

Dividers

#00C48C

Success Teal

Paid · approved

#FF647C

Error Coral

Failed · blocked

Key Insights for ShopilotInsights Clave para Shopilot

Navy-shifted dark bg (#1D2130) creates more "financial authority" feel than neutral dark · Compliance status rows: clear color coding (approved=teal, pending=amber, failed=red) · Dense multi-level table hierarchy (company > employee > payment) — similar to Shopilot's ASIN > marketplace > metric hierarchyFondo oscuro desplazado hacia navy (#1D2130) crea más sensación de "autoridad financiera" que el oscuro neutro · Filas de estado de cumplimiento: codificación de color clara (aprobado=teal, pendiente=amber, fallido=rojo) · Jerarquía de tabla multi-nivel densa (empresa > empleado > pago) — similar a jerarquía ASIN > marketplace > métrica de Shopilot

RP

Replit

Browser-based IDE · San Francisco · 2016 · YC W18 · $1.16B valuationIDE en Navegador · San Francisco · 2016 · YC W18 · Valoración $1.16B

AI-Native Dev Tool

Brand PhilosophyFilosofía de Marca

"Code, create, and learn together"

Replit's brand bridges developer-serious and beginner-accessible. Their orange (#F56C2A) is warmer and more playful than Cursor's (#f54e00) — intentional, as Replit serves both students and professionals. The dark background (#0D1117) is identical to GitHub's dark mode — leveraging existing mental models for developers. Their recent pivot to "Replit AI" accelerated their design maturity: more glass effects, more gradient accents, more AI-native patterns. Strong parallel to Shopilot: both are Electron-like experiences where the IDE/marketplace is the primary canvas and AI assistance is the sidebar.La marca de Replit hace un puente entre serio-developer y accesible-principiante. Su naranja (#F56C2A) es más cálido y juguetón que el de Cursor (#f54e00) — intencional, ya que Replit sirve tanto a estudiantes como a profesionales. El fondo oscuro (#0D1117) es idéntico al modo oscuro de GitHub — aprovechando modelos mentales existentes de developers. Su pivot reciente a "Replit AI" aceleró su madurez de diseño: más efectos de vidrio, más acentos degradados, más patrones AI-native. Fuerte paralelismo con Shopilot: ambas son experiencias tipo Electron donde el IDE/marketplace es el canvas primario y la asistencia AI es la sidebar.

Color SystemSistema de Color

#0D1117

BG Dark

GitHub-identical dark

#F6F8FA

BG Light

GitHub-identical light

#F56C2A

Replit Orange

Brand · CTAs

#FF7B54

Orange Light

Hover state

#161B22

Surface

Cards · panels

#21262D

Border

Dividers

#3FB950

Success

Build success

#F85149

Error

Build error

Key Insights for ShopilotInsights Clave para Shopilot

Split IDE+AI sidebar = exact Shopilot architecture · GitHub-familiar dark (#0D1117) leverages existing developer trust · Orange on very dark bg creates high contrast CTA that developers actually click · AI sidebar streaming pattern identical to Shopilot's coaching sidebarSplit IDE+AI sidebar = arquitectura exacta de Shopilot · Oscuro familiar de GitHub (#0D1117) aprovecha confianza existente de developers · Naranja sobre fondo muy oscuro crea CTA de alto contraste que developers realmente hacen click · Patrón de streaming de sidebar AI idéntico a la sidebar de coaching de Shopilot

LU

Luma

Event Platform · San Francisco · 2020 · YC W21 · $150M raisedPlataforma de Eventos · San Francisco · 2020 · YC W21 · $150M recaudados

Community Platform

Brand PhilosophyFilosofía de Marca

"Beautiful event pages that convert"

Luma is the most aesthetically-ambitious brand in this study. Where other products in this list use minimalism as a constraint, Luma uses it as a canvas. Their gradient-based identity (iridescent teal-purple-magenta) feels luxurious without being cluttered. Dark background (#09090B — the darkest in this study, almost absolute black) makes the gradients pop like neon lights in a dark room. Included here because Luma shows what happens when you invest in aesthetic excess as a differentiator — events need to feel exciting, and Luma's brand creates that emotional response. Relevant to Shopilot's onboarding and marketing pages.Luma es la marca más ambiciosa estéticamente de este estudio. Donde otros productos de esta lista usan el minimalismo como restricción, Luma lo usa como lienzo. Su identidad basada en degradados (teal-púrpura-magenta iridiscente) se siente lujosa sin estar saturada. El fondo oscuro (#09090B — el más oscuro de este estudio, casi negro absoluto) hace que los degradados resalten como luces de neón en una habitación oscura. Incluida aquí porque Luma muestra lo que sucede cuando inviertes en exceso estético como diferenciador — los eventos necesitan sentirse emocionantes, y la marca de Luma crea esa respuesta emocional. Relevante para las páginas de onboarding y marketing de Shopilot.

Color SystemSistema de Color

#09090B

BG Absolute

Near-perfect black

#FAFAFA

BG Light

Clean off-white

gradient

Brand Iridescent

Teal→Purple→Pink

#A855F7

Primary Purple

CTAs on dark bg

#141416

Surface

Cards

#1C1C1F

Surface 2

Nested cards

#4FACFE

Teal Blue

Info · links

#EC4899

Pink Accent

Featured · special

Key Insights for ShopilotInsights Clave para Shopilot

Gradient accents for marketing pages only (NOT product UI) — this is the lesson · #09090B absolute black → glass cards on top create incredible depth with zero shadows · Premium "entrance" moments deserve gradient treatment (Shopilot: first-login, marketplace activation) · Inter Display with tight letter-spacing (-0.04em) = expensive look at zero costAcentos degradados solo para páginas de marketing (NO UI de producto) — esta es la lección · Negro absoluto #09090B → tarjetas de vidrio encima crean profundidad increíble con cero sombras · Los momentos "entrada" premium merecen tratamiento degradado (Shopilot: primer login, activación marketplace) · Inter Display con espaciado de letras ajustado (-0.04em) = apariencia costosa a costo cero

🚧

Brand Identity: NOT DEFINED YETIdentidad de Marca: AÚN NO DEFINIDA

This section is a decision framework — a structured guide to the brand choices that must be made before any design system can be built. Nothing here is decided. The references above are inspiration material only.Esta sección es un framework de decisiones — una guía estructurada de las decisiones de marca que deben tomarse antes de construir cualquier design system. Nada aquí está decidido. Las referencias anteriores son solo material de inspiración.

Brand Decision Log — StatusRegistro de Decisiones de Marca — Estado

# DecisionDecisión OptionsOpciones StatusEstado
01Brand philosophy / taglineFilosofía de marca / tagline3 candidates below3 candidatos abajoPENDING
02Primary colorColor primario4 palette candidates below4 paletas candidatas abajoPENDING
03Typography stackStack tipográfico3 pairings below3 combinaciones abajoPENDING
04Logo directionDirección del logoWordmark / Icon+Text / Abstract markWordmark / Icono+Texto / Marca abstractaPENDING
05Dark vs Light vs BothOscuro vs Claro vs AmbosLean: dark-first · Risk: alienates someRecomendación: dark-first · Riesgo: aliena a algunosPENDING
06Brand voice / personalityVoz / personalidad de marcaExpert coach / Trusted advisor / Efficient toolCoach experto / Asesor de confianza / Herramienta eficientePENDING

Decision 01 · Brand PhilosophyDecisión 01 · Filosofía de Marca

CHOOSE ONE

Based on the 16 brands studied, three directions emerged as viable for Shopilot. Each implies a different visual language, color family, and interaction tone.De las 16 marcas estudiadas, surgieron tres direcciones viables para Shopilot. Cada una implica un lenguaje visual, familia de colores e interacción diferente.

A · "Warm Precision"

Warm neutral backgrounds, orange/amber accent, trust through clarity. References: Linear + HubSpot. Best for: sellers who want a tool that feels like a trusted advisor, not a cold dashboard.Fondos neutrales cálidos, acento naranja/ámbar, confianza a través de la claridad. Referencias: Linear + HubSpot. Mejor para: sellers que quieren una herramienta que se siente como asesor de confianza, no un dashboard frío.

B · "Data Intelligence"

Pure dark, electric blue accent, Bloomberg-inspired density. References: Datadog + Bloomberg. Best for: power sellers who see the product as a professional data terminal, prioritizing information density over warmth.Oscuro puro, acento azul eléctrico, densidad estilo Bloomberg. Referencias: Datadog + Bloomberg. Mejor para: sellers avanzados que ven el producto como terminal de datos profesional, priorizando densidad sobre calidez.

C · "Growth Engine"

Dark with green/teal accent, optimistic tone. References: Shopify + Notion. Best for: growth-focused sellers who associate green with profit and want the tool to feel empowering and action-oriented.Oscuro con acento verde/teal, tono optimista. Referencias: Shopify + Notion. Mejor para: sellers orientados al crecimiento que asocian el verde con ganancia y quieren una herramienta empoderada.

Recomendación del estudio: Direction A ("Warm Precision") differentiates most from Helium 10 (purple/2018), Jungle Scout (green/consumer), and Repricer (corporate blue). It positions Shopilot as the only warm, AI-native seller tool. However — this is a recommendation, not a decision.La dirección A ("Warm Precision") diferencia más de Helium 10 (morado/2018), Jungle Scout (verde/consumidor) y Repricer (azul corporativo). Posiciona a Shopilot como la única herramienta de vendedor cálida y AI-native. Sin embargo — esto es una recomendación, no una decisión.

Decision 02 · Primary Color PaletteDecisión 02 · Paleta de Color Principal

CHOOSE ONE

These 4 candidates were derived from the competitive analysis. Each avoids direct collision with existing tools in the market.Estos 4 candidatos se derivaron del análisis competitivo. Cada uno evita colisión directa con herramientas existentes en el mercado.

Orange — #F97316

Energy + action. Competitive differentiation from purple (Helium 10), blue (Repricer), green (Jungle Scout). HubSpot owns "CRM orange" — risk: some overlap perception.Energía + acción. Diferenciación de morado (Helium 10), azul (Repricer), verde (Jungle Scout). HubSpot posee "CRM naranja" — riesgo: percepción de overlap.

Indigo — #6366F1

Intelligence + trust. Used by Linear. Risk: perceived as too similar to Helium 10's purple. Benefit: associates with AI/tech precision.Inteligencia + confianza. Usado por Linear. Riesgo: percibido demasiado similar al morado de Helium 10. Beneficio: asocia con precisión AI/tech.

Sky Blue — #0EA5E9

Clarity + openness. Clean differentiation. Risk: overly generic in SaaS. Benefit: universally accessible, no color blindness issues.Claridad + apertura. Diferenciación limpia. Riesgo: demasiado genérico en SaaS. Beneficio: universalmente accesible, sin problemas de daltonismo.

Violet — #8B5CF6

Premium + AI. High association with AI products (Claude, Perplexity). Risk: Helium 10 has purple brand equity. Benefit: strong AI-native signal to tech-savvy sellers.Premium + AI. Alta asociación con productos AI (Claude, Perplexity). Riesgo: Helium 10 tiene equity de marca morada. Beneficio: señal AI-native fuerte para sellers tech-savvy.

What the study recommends:Lo que el estudio recomienda: Orange (#F97316) for maximum warm contrast. But this requires a final call from the team — specifically: does Shopilot want to feel more like a financial tool (blue/indigo) or more like an action-oriented coach (orange)?Naranja (#F97316) para máximo contraste cálido. Pero esto requiere una decisión final del equipo — específicamente: ¿quiere Shopilot sentirse más como herramienta financiera (azul/índigo) o más como coach orientado a la acción (naranja)?

Decision 03 · Typography StackDecisión 03 · Stack Tipográfico

CHOOSE ONE
OptionOpción Display / UIDisplay / UI Numbers / CodeNúmeros / Código ReferenceReferencia
AInterJetBrains MonoLinear, Vercel — neutral, modern, safe
BGeist / DM SansJetBrains MonoVercel, Framer — slightly more personality
CIBM Plex SansIBM Plex MonoIBM, Datadog — technical authority, B2B trust

All 3 options are free, widely available, and render well in Electron. The mono font for numbers is non-negotiable across all options — see Section 14 design rationale for why.Las 3 opciones son gratuitas, ampliamente disponibles y renderizan bien en Electron. La fuente mono para números es innegociable en todas las opciones — ver la sección 14 para el fundamento del diseño.

Decision 04 · Logo DirectionDecisión 04 · Dirección de Logo

CHOOSE ONE
[ wordmark ]

Wordmark Only

Just the "Shopilot" name in custom lettering. Simple, flexible. Risk: hard to use at small sizes (tray icon, favicon).Solo el nombre "Shopilot" en lettering personalizado. Simple, flexible. Riesgo: difícil a tamaños pequeños.

S
shopilot

Icon + Wordmark

Symbol that works standalone (tray, favicon, app icon) + name for contexts with space. Most flexible system.Símbolo que funciona solo (tray, favicon, ícono de app) + nombre para contextos con espacio. Sistema más flexible.

[ abstract ]

Abstract Mark

Unique geometric shape with no letterform. High memorability ceiling. Risk: requires brand awareness to work — too early for a v1 product.Forma geométrica única sin letterform. Alto techo de memorabilidad. Riesgo: requiere conocimiento de marca — demasiado pronto para v1.

Recommended for v1:Recomendado para v1: Option B (Icon + Wordmark). Allows a small icon in the macOS tray, a medium icon in the dock, and full wordmark in the sidebar. But the icon design itself is a separate creative decision — do not ship a placeholder.Opción B (Ícono + Wordmark). Permite un ícono pequeño en el tray de macOS, ícono mediano en el dock, y wordmark completo en el sidebar. Pero el diseño del ícono en sí es una decisión creativa separada — no hacer ship con un placeholder.

Decision 05 · Dark vs Light ModeDecisión 05 · Modo Oscuro vs Claro

CHOOSE ONE

Dark-first (recommended by study)Dark-first (recomendado por el estudio)

Cursor, Linear, Arc, Datadog, Claude — all dark-first. Reduces eye strain in long sessions. Numbers pop on dark backgrounds. All reference brands studied use dark mode as the primary experience. Competitive differentiation from Helium 10 (light default).Cursor, Linear, Arc, Datadog, Claude — todos dark-first. Reduce fatiga visual en sesiones largas. Los números destacan sobre fondos oscuros. Diferenciación de Helium 10 (claro por defecto).

Risk of dark-onlyRiesgo de solo oscuro

Some sellers work in bright environments (warehouses, offices). If Shopilot is dark-only, it may feel hard to read in those contexts. A light mode in Phase 2 is strongly advisable. V1: dark only to reduce scope.Algunos sellers trabajan en ambientes brillantes (almacenes, oficinas). Si Shopilot es solo oscuro, puede ser difícil de leer en esos contextos. Un modo claro en Fase 2 es muy recomendable. V1: solo oscuro para reducir el alcance.

→ How to use this section→ Cómo usar esta sección

  1. Review the 16 reference brand books above — understand what each brand does and why.Revisar los 16 brand books de referencia arriba — entender qué hace cada marca y por qué.
  2. Make a decision on each of the 6 items in the tracker at the top of this section. Pablo + Mateo + Sergio should be in the room.Tomar una decisión en cada uno de los 6 ítems del tracker al inicio de esta sección. Pablo + Mateo + Sergio deben estar presentes.
  3. Document the chosen direction back into this spec — replace "PENDING" with the decided value and the rationale.Documentar la dirección elegida de vuelta en este spec — reemplazar "PENDING" con el valor decidido y el razonamiento.
  4. Only then build design tokens (§14 · Stack) — the CSS custom properties, the Tailwind config, the Style Dictionary pipeline. Building tokens before the brand decisions are made is wasted work.Solo entonces construir los design tokens (§14 · Stack) — las propiedades CSS, el config de Tailwind, el pipeline de Style Dictionary. Construir tokens antes de decidir la marca es trabajo desperdiciado.
  5. Commission a designer for the logo once the color and philosophy direction are locked. Do not use AI-generated or placeholder marks in any public-facing context.Contratar a un diseñador para el logo una vez que la dirección de color y filosofía esté definida. No usar marcas generadas por AI ni placeholders en ningún contexto público.
§14 · SÍNTESIS

Study Synthesis — Patterns Found Across All 16 Brands Síntesis del Estudio — Patrones Encontrados en las 16 Marcas

After analyzing 16 world-class products (Anthropic, Cursor, Linear, Arc, Figma, Stripe, Vercel, HubSpot, Shopify, Datadog, Bloomberg, Notion, Intercom, Brex, Mercury, Luma), 7 universal patterns emerged that every top-tier product shares — regardless of industry, color, or audience. These are conclusions, not recommendations for Shopilot. Tras analizar 16 productos de clase mundial, emergieron 7 patrones universales que comparten todos los productos de primer nivel — independientemente de industria, color o audiencia. Estas son conclusiones del estudio, no recomendaciones para Shopilot.

01

One strong primary color = brand ownership Un color primario fuerte = propiedad de categoría

Every studied brand owns exactly ONE color. Not two, not a gradient system as their identity — one color that is unmistakably theirs. This color appears on buttons, on the favicon, on the loading state, on the cursor. It becomes the brand. Cada marca estudiada posee exactamente UN color. No dos, no un sistema de gradientes como identidad — un color que es inconfundiblemente suyo. Aparece en botones, favicon, estado de carga y cursor. Se convierte en la marca.

Anthropic #CC785C copper

Linear #5e6ad2 indigo

HubSpot #FF7A59 orange

Shopify #96BF48 green

What the study shows:Lo que el estudio muestra: Color category ownership is first-come-first-served. Purple → Figma/Anthropic. Green → Shopify/Notion. Blue → almost every generic SaaS. Orange → HubSpot. The strongest move for a new brand is to claim a color that no dominant competitor owns in its specific category.La propiedad de color por categoría es "el primero en llegar se sirve primero". Morado → Figma/Anthropic. Verde → Shopify/Notion. Azul → casi todo SaaS genérico. Naranja → HubSpot. El movimiento más fuerte para una nueva marca es reclamar un color que ningún competidor dominante posea en su categoría específica.

02

Power tools are dark-first — light mode is an afterthought Las herramientas de poder son dark-first — el modo claro es secundario

Of the 16 brands studied: Cursor, Linear, Arc, Datadog, Bloomberg, Claude, Vercel, Brex, Mercury, Retool, PostHog — all ship dark as primary. Light mode exists but is not the designed-for experience. The pattern holds across every product category where users are professionals staring at screens for 6+ hours. De las 16 marcas estudiadas: Cursor, Linear, Arc, Datadog, Bloomberg, Claude, Vercel, Brex, Mercury, Retool, PostHog — todas hacen dark como primario. El modo claro existe pero no es la experiencia diseñada. El patrón se mantiene en toda categoría donde los usuarios son profesionales mirando pantallas por 6+ horas.

ProductProducto Primary modeModo primario BackgroundFondo
CursorDark#1B1B1F — near-black, warm
LinearDark#0F0F11 — pure dark
Claude / AnthropicDark#1A1A2E — violet-shifted dark
Arc BrowserDark#1C1C1E — macOS standard dark
DatadogDark#14131A — purple-shifted
VercelDark#000000 — pure black
HubSpot / StripeLight#FFFFFF — pure white

Study finding:Hallazgo del estudio: The dark backgrounds that work best are NOT pure black (#000). They are near-blacks with a hue shift — warm (#1B1B1F), cool (#0F0F11), violet (#14131A), or macOS system (#1C1C1E). Pure black creates harshness; hue-shifted dark creates depth. Also: the darker the background, the more the accent color pops — which is why dark-first products can use a single, lower-saturation accent and still feel branded.Los fondos oscuros que mejor funcionan NO son negro puro (#000). Son near-blacks con un cambio de tono — cálido (#1B1B1F), frío (#0F0F11), violeta (#14131A) o sistema macOS (#1C1C1E). El negro puro crea dureza; el oscuro con tono crea profundidad.

03

Typography: 2 fonts maximum — one sans, one mono Tipografía: máximo 2 fuentes — una sans, una mono

Every studied product uses a sans-serif for UI text and a monospace font for all data, code, and numbers. No exceptions. The monospace font for numbers is not a stylistic choice — it is functional: proportional fonts create unstable number columns. Monospace makes data scannable. Cada producto estudiado usa una sans-serif para texto de UI y una mono para datos, código y números. Sin excepciones. La fuente mono para números no es elección estilística — es funcional: las fuentes proporcionales crean columnas de números inestables. La mono hace los datos escaneables.

Sans-serif findingsHallazgos sans-serif

  • Inter — Linear, Vercel, Notion, PostHog
  • Geist — Vercel (custom, based on Inter)
  • SF Pro — Arc, Cursor (system default)
  • Söhne / Graphik — Anthropic, Figma
  • IBM Plex Sans — Datadog, IBM products

Finding: Inter dominates because it's free, variable weight, and optimized for screens. The system font (SF Pro on Mac) is the "invisible" choice that native apps use for maximum rendering quality.Hallazgo: Inter domina por ser gratuita, variable y optimizada para pantallas.

Mono findingsHallazgos mono

  • JetBrains Mono — Cursor, Linear, Vercel
  • Fira Code — developer tools generally
  • SF Mono — Arc, macOS native
  • IBM Plex Mono — Datadog, Brex
  • Geist Mono — Vercel (v2)

Finding: JetBrains Mono is the modern standard for developer-adjacent tools. Its ligatures are readable at 10–12px which is where data tables live.Hallazgo: JetBrains Mono es el estándar moderno para herramientas para desarrolladores. Sus ligaduras son legibles a 10-12px.

04

Motion is functional, not decorative — and it's invisible when done right El movimiento es funcional, no decorativo — es invisible cuando está bien hecho

None of the studied products use animation for visual delight. Every transition serves a purpose: orientation (this panel came from the right), state change (this button is now loading), hierarchy (this modal is above the content). The rule: if you can remove the animation and the user still understands what happened, the animation was decorative. Remove it. Ninguno de los productos estudiados usa animación para deleite visual. Cada transición sirve un propósito: orientación, cambio de estado, jerarquía. Regla: si puedes quitar la animación y el usuario aún entiende qué pasó, la animación era decorativa. Elimínala.

AnimationAnimación DurationDuración PurposePropósito Seen inVisto en
Hover bg change100–150msAcknowledge interactionAll products
Button press scale80ms ease-outPhysical click feedbackLinear, Arc, Luma
Modal slide-up200–250ms springLayer hierarchyFigma, Notion, Linear
Streaming text fade80ms per wordShow AI is generatingClaude, Cursor
Thinking pulse ···1.2s infiniteAI is processingClaude, Cursor, Copilot
Sidebar collapse200ms ease-in-outPreserve spatial orientationLinear, Arc, Notion
05

AI products share a specific visual language for trust and transparency Los productos AI comparten un lenguaje visual específico de confianza y transparencia

The study of Anthropic, Cursor, and Claude Code revealed a distinct pattern absent in non-AI products: every AI action is visually accountable. You always see what the AI is doing, what tool it used, how long it took. There are no black boxes in the UI of the best AI products. El estudio de Anthropic, Cursor y Claude Code reveló un patrón distinto ausente en productos no-AI: cada acción de la IA es visualmente accountable. Siempre ves qué está haciendo, qué herramienta usó, cuánto tardó. No hay cajas negras en la UI de los mejores productos AI.

AI-native patterns (present in all studied AI products)Patrones AI-native (presentes en todos los AI estudiados)

  • Streaming first: never show a spinner while generating textnunca mostrar spinner mientras se genera texto
  • Tool transparency: show every tool call with name + duration + resultmostrar cada tool call con nombre + duración + resultado
  • Reversibility signals: visually distinguish reversible from irreversible actions before confirmationdistinguir visualmente reversible de irreversible antes de confirmar
  • Context visibility: always show what the AI knows (context window, memory, recent files)siempre mostrar qué sabe la IA (ventana de contexto, memoria, archivos recientes)
  • Interrupt capability: stop button always visible during AI generationbotón de stop siempre visible durante generación

Anti-patterns (absent in top AI products)Anti-patrones (ausentes en top AI products)

  • Skeleton loaders for AI output — creates false expectation of content structureSkeleton loaders para output AI — crea expectativa falsa de estructura
  • Generic spinners while thinking — no information, builds anxietySpinners genéricos mientras piensa — sin información, genera ansiedad
  • Hiding tool execution — users don't know what changed in their systemsOcultar ejecución de herramientas — usuarios no saben qué cambió
  • One-shot confirmation dialogs — no diff, no preview, just "Are you sure?"Confirmaciones de un solo paso — sin diff, sin preview, solo "¿Estás seguro?"
06

Information density is a product decision, not a design afterthought La densidad de información es una decisión de producto, no un afterthought de diseño

The studied products cluster into two density philosophies — and both work, but for different users. The choice of density must be made at the product level before any design work begins, because it determines spacing tokens, component heights, font sizes, and the entire information architecture. Los productos estudiados se agrupan en dos filosofías de densidad — ambas funcionan, pero para usuarios distintos. La elección de densidad debe hacerse a nivel de producto antes de cualquier trabajo de diseño, porque determina tokens de espaciado, alturas de componentes, tamaños de fuente y toda la arquitectura de información.

High density — expert toolsAlta densidad — herramientas expertas

Bloomberg, Datadog, Retool, Brex. Row height ≈ 32px. Font size: 11–12px. Assume users know what they're looking at. More information per screen = fewer clicks. Used by professionals who stare at it for hours.Bloomberg, Datadog, Retool, Brex. Altura de fila ≈ 32px. Tamaño de fuente: 11-12px. Los usuarios saben lo que están mirando. Más información por pantalla = menos clics.

Comfortable density — balanced toolsDensidad confortable — herramientas balanceadas

Linear, Notion, Intercom, Luma. Row height ≈ 44px. Font size: 13–14px. Sufficient whitespace to feel premium without hiding data. Works for both new and expert users.Linear, Notion, Intercom, Luma. Altura de fila ≈ 44px. Tamaño de fuente: 13-14px. Suficiente espacio en blanco para sentirse premium sin ocultar datos.

07

Brand = how you speak, not just how you look La marca es cómo hablas, no solo cómo te ves

The strongest brands in the study have a distinct voice in every single word of their UI — button labels, error messages, onboarding copy, empty states, confirmation dialogs. The voice is as distinctive as the color. Stripe writes error messages like a knowledgeable friend. Linear writes UI copy with extreme brevity. Anthropic writes with careful epistemic humility ("I think", "Based on what I know"). Las marcas más fuertes del estudio tienen una voz distintiva en cada palabra de su UI — etiquetas de botones, mensajes de error, copy de onboarding, estados vacíos, diálogos de confirmación. La voz es tan distintiva como el color.

Stripe

Error: "Your card was declined. This sometimes happens if the issuing bank suspects fraud. Try a different card or contact your bank."Error: "Tu tarjeta fue rechazada. A veces ocurre si el banco sospecha fraude. Intenta con otra tarjeta."

Linear

Error: "Failed to sync." ← That's it. No explanation. They trust users to understand context. Extreme brevity as brand.Error: "No se pudo sincronizar." ← Eso es todo. Sin explicación. Brevedad extrema como marca.

Anthropic / Claude

Response: "I'm not certain, but based on what I know..." — epistemic humility baked into every sentence.Respuesta: "No estoy seguro, pero basándome en lo que sé..." — humildad epistémica en cada frase.

Summary — What all world-class products shareResumen — Lo que comparten todos los productos de clase mundial

DimensionDimensión Universal patternPatrón universal Applies to Shopilot?¿Aplica a Shopilot?
Color1 primary accent, 2 functional (success/error), neutral scaleYes — must decide
BackgroundNear-black with hue shift (not #000 or #111)Yes — must decide hue
Typography1 sans for UI + 1 mono for all numbers/dataYes — must choose pair
Motion100–250ms, purposeful only, spring easingYes — adopt directly
AI statesStreaming text, thinking pulse, tool transparencyYes — core requirement
DensityChoose high or comfortable — don't mixYes — must decide
VoiceEvery word of UI reflects brand personalityYes — must define
LogoWorks at 16px (favicon/tray) AND at 200pxYes — must commission
§14 · NECESIDADES

What Shopilot Needs — Design Requirements Analysis Lo que Shopilot Necesita — Análisis de Requerimientos de Diseño

Based on the study synthesis and Shopilot's product definition (AI-native Electron desktop app for e-commerce sellers, 70/30 split, 36 tools, marketplace integration), here is every design element the product needs — independent of brand decisions. These are requirements, not solutions. Basado en la síntesis del estudio y la definición del producto Shopilot (app Electron desktop AI-native para sellers de e-commerce, split 70/30, 36 herramientas, integración de marketplace), aquí están todos los elementos de diseño que el producto necesita — independientemente de las decisiones de marca. Estos son requerimientos, no soluciones.

MASTER CHECKLIST

The 15 things Shopilot must complete to have a world-class designLas 15 cosas que Shopilot debe completar para tener un diseño de clase mundial

Single source of truth. Everything in one place. The detailed breakdown is in the categories below — this is the executive view.Fuente única de verdad. Todo en un lugar. El desglose detallado está en las categorías debajo — esta es la vista ejecutiva.

Phase 1 — Brand IdentityFase 1 — Identidad de Marca (before writing a single line of UI code)

# TaskTarea OutputOutput OwnerOwner StatusEstado
01 Run brand workshop — choose Brand Philosophy (what emotion does Shopilot own?)Realizar brand workshop — elegir Filosofía de Marca (¿qué emoción posee Shopilot?) 1-sentence brand positionPosición de marca en 1 oración Pablo PENDING
02 Decide primary brand color — pick from candidates (see §Brand Decision Framework)Decidir color primario de marca — elegir de candidatos (ver §Brand Decision Framework) 1 hex value, named, documented1 valor hex, nombrado, documentado Pablo + team PENDING
03 Choose typography pair — UI sans + data mono (see §24 References for options)Elegir par tipográfico — UI sans + data mono (ver §24 Referencias para opciones) 2 font names, weight scale defined2 nombres de fuentes, escala de pesos definida Pablo + Sergio PENDING
04 Build the color system — dark bg scale (4 tones) + text scale (4 levels) + semantic colorsConstruir el sistema de color — escala dark bg (4 tonos) + escala de texto (4 niveles) + colores semánticos design-tokens.json — color sectiondesign-tokens.json — sección de color Sergio BLOCKED by 02
05 Commission logo — wordmark + icon mark, works at 16px and 512pxEncargar logo — wordmark + icon mark, funciona a 16px y 512px SVG files: logo.svg, icon.svg, favicon.svgArchivos SVG: logo.svg, icon.svg, favicon.svg Pablo (hire) BLOCKED by 01+02

Phase 2 — UI FoundationFase 2 — Fundación UI (tokens → CSS vars → Tailwind config, semanas 1–2)

# TaskTarea OutputOutput OwnerOwner StatusEstado
06 Complete design-tokens.json — spacing (--g / --v system), radii, shadows, durationCompletar design-tokens.json — espaciado (sistema --g / --v), radios, sombras, duración tokens.json W3C DTCG format Sergio + Mateo BLOCKED by 04
07 Run Style Dictionary pipeline — tokens.json → CSS :root vars + tailwind.config.jsEjecutar pipeline Style Dictionary — tokens.json → CSS :root vars + tailwind.config.js tokens.css, tailwind.config.js Mateo BLOCKED by 06
08 Build Electron window shell — frameless + drag region + macOS traffic lights + 70/30 splitConstruir shell de ventana Electron — frameless + drag region + botones macOS + split 70/30 Running Electron with correct window chromeElectron corriendo con chrome de ventana correcto Sergio PENDING
09 Implement base atoms — Button (6 variants), Badge, Input, Spinner, Tooltip, DividerImplementar átomos base — Button (6 variantes), Badge, Input, Spinner, Tooltip, Divider 6 React components using tokens6 componentes React usando tokens Sergio BLOCKED by 07

Phase 3 — Core ComponentsFase 3 — Componentes Core (semanas 2–6)

# TaskTarea OutputOutput OwnerOwner StatusEstado
10 Build Coach screen — streaming text cursor ▊ + thinking pulse ··· + tool accordion (4 states) + chat inputConstruir pantalla Coach — cursor de texto streaming ▊ + pulso thinking ··· + tool accordion (4 estados) + input de chat Functional coach view with AI state machineVista coach funcional con máquina de estados AI Sergio BLOCKED by 09
11 Build Confirmation Dialog — reversible (amber) vs irreversible (red) variants + diff displayConstruir Confirmation Dialog — variantes reversible (amber) vs irreversible (rojo) + diff display ConfirmationDialog.tsx 2 variants Sergio BLOCKED by 09
12 Build KPI card + data table (sortable) + delta badges — the 80% of the Dashboard screenConstruir KPI card + data table (sortable) + delta badges — el 80% de la pantalla Dashboard Dashboard screen with real dataPantalla Dashboard con datos reales Sergio + Andrés BLOCKED by 09
13 Build status bar (24px) — agent state dot left + credits + model name rightConstruir status bar (24px) — punto de estado del agente izquierda + créditos + nombre de modelo derecha StatusBar.tsx always visible Sergio BLOCKED by 08
14 Build context bar — active ASIN + marketplace dot + context window progress barConstruir context bar — ASIN activo + punto de marketplace + barra de progreso de context window ContextBar.tsx Sergio BLOCKED by 09
15 Accessibility audit — WCAG AA contrast check on all components, keyboard nav, focus ringsAuditoría de accesibilidad — verificación de contraste WCAG AA en todos los componentes, navegación por teclado, focus rings 0 WCAG AA violations0 violaciones WCAG AA Sergio + Andrés BLOCKED by 09-14

Critical path:Ruta crítica: 01 (brand workshop) unblocks everything. Nothing else can start until the team aligns on what emotion Shopilot owns. That's the only decision that can't be delegated or automated.01 (brand workshop) desbloquea todo. Nada más puede empezar hasta que el equipo se alinee en qué emoción posee Shopilot. Es la única decisión que no puede ser delegada ni automatizada.

Category 1 — Brand Identity Elements (detail)Categoría 1 — Elementos de Identidad de Marca (detalle)

ALL MISSINGTODO FALTANTE
ElementElemento Why neededPor qué se necesita Used whereUsado dónde StatusEstado
Logo mark (icon)Works at 16px — macOS dock, tray, faviconElectron dock icon, tray, browser tabMISSING
Wordmark (logotype)Full name, readable at 120px+App sidebar header, landing page, screenshotsMISSING
Primary brand colorButtons, links, active states, focus ringsEverywhere interactive — 200+ UI elementsPENDING DECISION
Background color scaleBase, surface, card, elevated — 4 dark tonesEvery screen, every componentPENDING DECISION
Foreground color scalePrimary text, secondary, muted, disabled — 4 levelsAll text, labels, placeholdersDERIVES FROM BG
Functional colorsSuccess (green), Warning (amber), Error (red), Info (blue)Alerts, badges, status indicators, audit logSTANDARD — PICK
UI typography (sans)All text except numbersLabels, paragraphs, headings, button textPENDING DECISION
Data typography (mono)All numbers, prices, percentages, codeKPI cards, tables, status bar, audit logPENDING DECISION

Category 2 — UI Components Required by the ProductCategoría 2 — Componentes UI Requeridos por el Producto

These are derived from Shopilot's 36 tools and 4 core screens (Coach view, Dashboard, Settings, Billing). Not a design choice — a product requirement.Se derivan de las 36 herramientas de Shopilot y 4 pantallas principales. No es elección de diseño — es un requerimiento del producto.

Foundation (week 1)Fundación (semana 1)

  • • Design tokens (CSS vars)
  • • Button (6 variants)
  • • Input / Textarea
  • • Badge / Tag
  • • Icon system (Lucide)
  • • Tooltip
  • • Spinner / Loading
  • • Divider

Coach screen (week 2-3)Pantalla Coach (semana 2-3)

  • • Chat message (user/AI)
  • • Streaming text cursor ▊
  • • Thinking pulse ···
  • • Tool accordion (4 states)
  • • Confirmation dialog
  • • Proactive suggestion card
  • • Context bar (ASIN + tokens)
  • • Chat input + send button

Data screens (week 4-6)Pantallas de datos (semana 4-6)

  • • KPI metric card
  • • Data table (sortable)
  • • Buy Box indicator
  • • Price delta bar
  • • BSR sparkline
  • • Audit log timeline
  • • Credit economy bar
  • • Fraud alert banner

Category 3 — Electron Desktop-Specific RequirementsCategoría 3 — Requerimientos Específicos de Desktop Electron

These have no equivalent in web apps. Required because Shopilot ships as a native macOS/Windows app, not a browser tab.No tienen equivalente en apps web. Requeridos porque Shopilot es app nativa macOS/Windows, no una pestaña de browser.

  • Title bar: frameless window with drag region + macOS traffic lightsventana sin marco con región de arrastre + botones macOS
  • Tab bar: marketplace switcher (Amazon / MeLi / Shopify) with colored dotsswitcher de marketplace (Amazon / MeLi / Shopify) con puntos de color
  • Status bar: 24px bottom bar — agent state left, credits + model rightbarra inferior 24px — estado del agente izq, créditos + modelo der
  • Tray icon: 16x16 mono SVG + badge count for alertsSVG mono 16x16 + badge para alertas
  • 70/30 split: marketplace WebView (left) + React sidebar (right) — visual seam between themWebView de marketplace (izq) + sidebar React (der) — costura visual entre ellos
  • Update modal: version info + changelog + progress + restart buttoninfo de versión + changelog + progreso + botón de reinicio
  • Notification system: 3 levels: in-app banner → OS push → tray badge3 niveles: banner in-app → push OS → badge del tray
  • App icon: 1024×1024px for App Store + 512px for macOS dock1024×1024px para App Store + 512px para dock macOS
§14 · WORKFLOW

Workflow 0 → Complete Brand — The Efficient Path Workflow 0 → Marca Completa — El Camino Eficiente

The most efficient process to go from "no brand" to a production-ready design system that rivals Anthropic, Cursor, or Linear. This is the process — not based on opinion, but on how the reference brands actually built their design systems. El proceso más eficiente para ir de "sin marca" a un design system listo para producción que rivalice con Anthropic, Cursor o Linear. Este es el proceso — no basado en opinión, sino en cómo las marcas de referencia construyeron sus design systems.

The 5-Phase ProcessEl Proceso de 5 Fases

1

Brand WorkshopBrand Workshop

1–2 days · Pablo + Mateo + Sergio1-2 días · Pablo + Mateo + Sergio

Make the 6 brand decisions from the Decision Framework above. No design tools needed — just a whiteboard or Notion doc. Output: a 1-page brand brief with every decision locked.Tomar las 6 decisiones de marca del Framework de Decisiones anterior. No se necesitan herramientas de diseño — solo una pizarra o doc de Notion. Output: un brand brief de 1 página con cada decisión bloqueada.

Decisions to lock in this phase:Decisiones a bloquear en esta fase:

Brand philosophy (A / B / C)Filosofía de marca (A / B / C) Primary color (which candidate)Color primario (qué candidato) Typography pair (which option)Par tipográfico (qué opción) Logo direction (wordmark / icon+text)Dirección logo (wordmark / icono+texto) Dark mode first: yes/noDark mode primero: sí/no Brand voice archetypeArquetipo de voz de marca
2

Visual Identity in FigmaIdentidad Visual en Figma

3–5 days · Designer (contract) + Pablo review3-5 días · Diseñador (contrato) + revisión Pablo

This is where Figma enters — but only for visual identity exploration, not for UI design. The goal is to validate color, logo, and typography before writing a single line of code. Figma is used here because visual decision-making is faster with a canvas tool than in code.Aquí es donde entra Figma — pero solo para exploración de identidad visual, no para diseño de UI. El objetivo es validar color, logo y tipografía antes de escribir una sola línea de código. Figma se usa aquí porque la toma de decisiones visuales es más rápida con una herramienta canvas.

What goes into Figma in Phase 2:Qué va a Figma en la Fase 2:

  • Logo mark explorations (6–10 directions)Exploraciones del logo (6-10 direcciones)
  • Color palette validation (light + dark test)Validación de paleta (test claro + oscuro)
  • Typography specimens (all weights + sizes)Especímenes tipográficos (todos los pesos + tamaños)
  • 3 brand application mockups (app icon, sidebar header, marketing screenshot)3 mockups de aplicación de marca

What does NOT go into Figma in Phase 2:Qué NO va a Figma en la Fase 2:

  • Full UI screens — premature without tokensPantallas completas de UI — prematuro sin tokens
  • Component library — built in code, not FigmaLibrería de componentes — se construye en código
  • User flows — too earlyUser flows — demasiado pronto

Tools for Phase 2:Herramientas para la Fase 2: Figma (free tier is enough) · fontpair.co for typography pairing · Coolors.co or Realtime Colors for palette generation · Adobe Color for accessibility check · Contrast.app for WCAG validationFigma (tier gratuito es suficiente) · fontpair.co para combinación tipográfica · Coolors.co o Realtime Colors para generación de paleta · Adobe Color para verificación de accesibilidad

3

Design Tokens → CodeDesign Tokens → Código

2 days · Sergio + Mateo2 días · Sergio + Mateo

Once brand decisions are locked from Phase 2, translate them into code immediately. This is where Figma connects to Claude Code: take the approved color values and typography from Figma, encode them as design tokens, and generate the CSS + Tailwind config. Claude Code accelerates this from 2 days to 4 hours.Una vez bloqueadas las decisiones de marca de la Fase 2, traducirlas a código inmediatamente. Aquí es donde Figma se conecta con Claude Code: tomar los valores de color y tipografía aprobados de Figma, codificarlos como design tokens, y generar el CSS + Tailwind config.

Figma → Claude Code integration flow:Flujo de integración Figma → Claude Code:

  1. Export approved brand values from Figma as JSON (Figma Variables → JSON via plugin "Variables Import Export")Exportar valores de marca aprobados desde Figma como JSON (Figma Variables → JSON via plugin)
  2. Paste JSON into Claude Code: "Convert these brand values to a W3C DTCG tokens.json file"Pegar JSON en Claude Code: "Convierte estos valores de marca a un archivo tokens.json DTCG W3C"
  3. Claude Code generates: tokens.json + style-dictionary.config.mjs + globals.css + tailwind.config.tsClaude Code genera: tokens.json + style-dictionary.config.mjs + globals.css + tailwind.config.ts
  4. Run Style Dictionary → CSS custom properties are live in the appEjecutar Style Dictionary → propiedades CSS custom están vivas en la app
  5. Validate: open Electron app, confirm colors match Figma specValidar: abrir app Electron, confirmar que los colores coinciden con el spec de Figma
4

Component Library with Claude CodeLibrería de Componentes con Claude Code

3–6 weeks · Sergio (primary) + Claude Code3-6 semanas · Sergio (principal) + Claude Code

This is the main build phase. All components are defined in Figma (#18 Design System) following Atomic Design (atoms, molecules, organisms, templates, pages). Claude reads the Figma via Figma MCP and implements matching React components in #1 Native Shell. No components are created outside of what is defined in the Figma.Esta es la fase de construcción principal. Todos los componentes están definidos en Figma (#18 Design System) siguiendo Atomic Design (átomos, moléculas, organismos, plantillas, páginas). Claude lee el Figma via Figma MCP e implementa componentes React en #1 Native Shell. No se crean componentes fuera de lo definido en el Figma.

How Claude Code works in this phase:Cómo trabaja Claude Code en esta fase:

  • Spec → component:Spec → componente: Give Claude Code a description from this spec (e.g., "build ToolAccordion with 4 states: queued/running/success/error, using design tokens from globals.css") → it generates the full TypeScript componentDarle a Claude Code una descripción de este spec → genera el componente TypeScript completo
  • Figma → React:Figma → React: Claude reads the Figma component via Figma MCP and generates all variants automatically with matching props and statesClaude lee el componente en Figma via Figma MCP y genera todas las variantes automáticamente con props y estados que coinciden
  • Accessibility audit:Auditoría de accesibilidad: "Review this component for WCAG AA compliance and fix any issues" — Claude Code runs the audit inline"Revisa este componente para cumplimiento WCAG AA y arregla los problemas"

Velocity benchmark:Benchmark de velocidad: A senior engineer without AI: 1 component/week (design + code + test + docs). With Claude Code: 1 component/day. 25 core components in 5 weeks instead of 25 weeks. This is the 5x leverage.Un ingeniero senior sin IA: 1 componente/semana. Con Claude Code: 1 componente/día. 25 componentes core en 5 semanas en lugar de 25. Este es el apalancamiento 5x.

5

First Real Screen → Test with SellersPrimera Pantalla Real → Test con Sellers

1 week · Full team1 semana · Equipo completo

Assemble the Coach View (the 70/30 split screen) using the built components and tokens. Show it to 3 real sellers. At this point the brand is real — not a Figma mockup, not a code spec, but a running Electron application with real brand tokens, real components, and real data. Collect feedback. Iterate.Ensamblar el Coach View (pantalla 70/30) usando los componentes y tokens construidos. Mostrárselo a 3 sellers reales. En este punto la marca es real — no un mockup de Figma, no un code spec, sino una aplicación Electron corriendo con tokens de marca reales, componentes reales y datos reales.

Figma vs Code — When to Use EachFigma vs Código — Cuándo Usar Cada Uno

This is the most common source of wasted effort in early-stage product design. The answer depends on what you're deciding, not on preference.Esta es la fuente más común de esfuerzo desperdiciado en diseño de producto en etapas tempranas. La respuesta depende de qué estás decidiendo, no de preferencia.

TaskTarea Use Figma?¿Usar Figma? WhyPor qué
Logo explorationExploración de logoYes — requiredSí — requeridoBezier curves, vector editing, proportions — impossible to do well in codeCurvas bezier, edición vectorial — imposible hacerlo bien en código
Color palette validationValidación de paleta de colorYes — fastSí — rápidoSeeing colors in context (on dark bg, next to text) is faster in Figma than spinning up codeVer colores en contexto es más rápido en Figma que arrancar el código
Typography testingTesting de tipografíaYes — fastSí — rápidoFont pairing decisions are visual, not technical. Figma + Google Fonts is 10x faster than code for thisDecisiones de pares de fuentes son visuales. Figma + Google Fonts es 10x más rápido que código para esto
User flow diagramsDiagramas de flujo de usuarioOptionalOpcionalCan also use FigJam, Miro, or paper. The flow is the output, not the toolTambién se puede usar FigJam, Miro o papel. El flujo es el output, no la herramienta
Individual component designDiseño de componente individualOccasionallyOcasionalmenteOnly for complex components (confirmation dialog, onboarding flow). Simple components: just build in code with Claude CodeSolo para componentes complejos. Simples: construir directo en código con Claude Code
Component libraryLibrería de componentesYes — source of truthSí — fuente de verdadFigma (#18 Design System) is the single source of truth following Atomic Design. Claude reads via Figma MCP and implements matching React components. No components created outside FigmaFigma (#18 Design System) es la fuente única de verdad siguiendo Atomic Design. Claude lee via Figma MCP e implementa componentes React. No se crean componentes fuera del Figma
Design tokensDesign tokensNo — live in tokens.jsonNo — viven en tokens.jsonFigma Variables exist but are secondary. The tokens.json → CSS pipeline is the real systemFigma Variables existen pero son secundarias. El pipeline tokens.json → CSS es el sistema real
Full screen prototypesPrototipos de pantalla completaNo — build in ElectronNo — construir en ElectronA running Electron app with real data is a better prototype than any Figma mockup. With Claude Code, the delta in effort is smallUna app Electron corriendo con datos reales es mejor prototipo que cualquier mockup de Figma

Time to World-Class Brand — Realistic EstimateTiempo para Marca de Clase Mundial — Estimado Realista

Phase 1Fase 1

2d

Brand workshopBrand workshop

Phase 2Fase 2

5d

Visual identityIdentidad visual

Phase 3Fase 3

2d

Tokens → codeTokens → código

Phase 4Fase 4

6w

Component libraryLibrería componentes

Phase 5Fase 5

1w

First real screenPrimera pantalla real

Total: ~8 weeks from zero to a brand that rivals Linear or Cursor. The bottleneck is Phase 2 (finding a designer) and Phase 4 (component build). Everything else is decisions + Claude Code automation.Total: ~8 semanas de cero a una marca que rivaliza con Linear o Cursor. El cuello de botella es la Fase 2 (encontrar diseñador) y la Fase 4 (construcción de componentes). Todo lo demás son decisiones + automatización de Claude Code.

§14 · REFERENCIAS

References — Figma, OS Design Systems & Desktop Apps Referencias — Figma, Design Systems de SO y Apps Desktop

The authoritative sources every world-class desktop app is built on: Apple's Human Interface Guidelines, Microsoft Fluent Design, how the best companies use Figma, what Figma Community files to download today, and visual references of the exact apps Shopilot should emulate as a macOS Electron product. Las fuentes autoritativas sobre las que se construye toda app desktop de clase mundial: Apple Human Interface Guidelines, Microsoft Fluent Design, cómo las mejores empresas usan Figma, qué archivos de Figma Community descargar hoy, y referencias visuales de las apps exactas que Shopilot debe emular como producto Electron macOS.

Apple Human Interface Guidelines (HIG)

developer.apple.com/design/human-interface-guidelines · The bible for macOS app designLa biblia del diseño de apps macOS

Every app that feels "native" on macOS — Arc, Cursor, Notion, Linear — follows Apple's HIG. Not as rules, but as a foundation. Understanding HIG tells you why certain things feel right on Mac and wrong on Windows, and what Shopilot must do to feel like a first-class macOS citizen. Cada app que se siente "nativa" en macOS — Arc, Cursor, Notion, Linear — sigue el HIG de Apple. No como reglas, sino como base. Entender el HIG explica por qué ciertas cosas se sienten bien en Mac y mal en Windows.

6 Core HIG Principles — and what they mean for Shopilot6 Principios HIG — y qué significan para Shopilot

1 · Aesthetic Integrity

The app's visual appearance and behavior must be consistent with its purpose. A data tool (Shopilot) should look precise and professional — not playful. Applies to: spacing consistency, typography alignment, color restraint.La apariencia visual y comportamiento deben ser consistentes con el propósito. Una herramienta de datos (Shopilot) debe verse precisa y profesional. Aplica a: consistencia de espaciado, alineación tipográfica, restricción de color.

2 · Consistency

Use standard macOS controls and terminology where possible. Users already know what a sidebar, toolbar, and panel are on Mac. Don't reinvent them — use them. Shopilot's window chrome (title bar, traffic lights, resize handle) must behave as users expect.Usar controles y terminología estándar de macOS donde sea posible. Los usuarios ya saben qué es un sidebar, toolbar y panel en Mac. El chrome de ventana de Shopilot debe comportarse como esperan.

3 · Direct Manipulation

Users should feel they're directly controlling the content on screen. For Shopilot: clicking an ASIN row should immediately feel responsive. Dragging, hovering, and focusing must have immediate visual feedback (≤100ms).Los usuarios deben sentir que controlan directamente el contenido en pantalla. Para Shopilot: hacer clic en una fila ASIN debe sentirse inmediatamente responsivo. Hover y foco deben tener respuesta visual inmediata (≤100ms).

4 · Feedback

Every action must acknowledge the user. Shopilot specifics: button press = visual depress + sound (optional). Loading = progress indicator, not frozen UI. AI thinking = animated cursor ▊ or pulse ···. Error = banner with next action, not silent failure.Cada acción debe reconocer al usuario. Botón = depresión visual. Carga = indicador de progreso. IA pensando = cursor animado. Error = banner con siguiente acción.

5 · User Control

Users — not the app — initiate actions. The AI coach can suggest, but must not act without confirmation on irreversible actions. HIG says: "people should always be in control." This is the origin of Shopilot's reversibility system.Los usuarios — no la app — inician acciones. El coach AI puede sugerir, pero no debe actuar sin confirmación en acciones irreversibles. Esta es la base del sistema de reversibilidad de Shopilot.

6 · Metaphors

Use familiar real-world concepts. Shopilot uses the "coach" metaphor — a trusted advisor who sees the same screen you do and gives guidance. This is why the sidebar is positioned like a coach standing next to you: right side, always visible, never blocking the main view.Usar conceptos reales familiares. Shopilot usa la metáfora del "coach" — un asesor de confianza que ve la misma pantalla. Por eso el sidebar está a la derecha, siempre visible, sin bloquear la vista principal.

macOS Patterns that Shopilot must implement correctlyPatrones macOS que Shopilot debe implementar correctamente

PatternPatrón HIG specSpec HIG Shopilot implementationImplementación Shopilot
Traffic lightsRed/Yellow/Green at 12px diameter, 8px gap, 20px from leftFrameless window + titleBarStyle:'hiddenInset' preserves native buttons
SidebarMin width 220px, vibrancy background, grouped sections with headersShopilot right sidebar 320px — deviates intentionally (coach, not nav)
ToolbarHeight 52px, icon + label, unified with title bar on macOS 11+Tab bar (marketplace switcher) sits at top of left pane, height 40px
Menu barEvery Mac app has native menu bar: File, Edit, View, Window, HelpElectron: Menu.setApplicationMenu() — must exist, even if minimal
Keyboard shortcutsCmd+W close, Cmd+Q quit, Cmd+, preferences — always expectedMust register all standard Mac shortcuts + Shopilot custom (Cmd+K = chat)
System colorsUse NSColor system colors that adapt to dark/light automaticallyIn Electron: CSS env(--system-background-color) or manual token switch
Focus ringBlue ring 3px at system accent color — do NOT remove, required for a11yOverride with brand accent color ring, same shape — never remove entirely

Reference: What Arc Browser takes from HIGReferencia: Lo que Arc Browser toma del HIG

Arc uses native macOS vibrancy for its sidebar, native traffic lights at the exact HIG position, native context menus via NSMenu, native keyboard shortcut conventions, and the native font stack (SF Pro) for all system-level text. Where Arc deviates from HIG is intentional and branded: the tab bar is vertical instead of horizontal, the command bar replaces the URL bar, the sidebar IS the app chrome. Deviation from HIG is a product decision — but you must know the rules before you break them.Arc usa vibrancy nativa de macOS para su sidebar, traffic lights en la posición exacta del HIG, menús contextuales nativos, convenciones de teclado nativas, y SF Pro para todo el texto del sistema. Donde Arc se desvía del HIG es intencional y de marca: la barra de tabs es vertical, la barra de comandos reemplaza la URL. La desviación del HIG es una decisión de producto — pero debes conocer las reglas antes de romperlas.

Microsoft Fluent Design System 2

fluent2.microsoft.design · Windows 11 design languageLenguaje de diseño Windows 11

Shopilot targets macOS first, but Windows build comes in Sprint 11-12. Fluent Design 2 is the official design system for Windows 11 apps. Understanding it now prevents a costly redesign later — and it informs several patterns (Acrylic material, Mica background) that translate beautifully to dark Electron apps on both platforms. Shopilot apunta a macOS primero, pero el build de Windows viene en Sprint 11-12. Fluent Design 2 es el design system oficial para apps Windows 11. Entenderlo ahora previene un rediseño costoso después.

5 Fluent Design Principles5 Principios de Fluent Design

Light

Light as a design element — Reveal highlight: a subtle glow appears under the cursor on interactive elements. Creates depth without shadows. In Electron: CSS radial-gradient on mousemove.La luz como elemento de diseño — Reveal highlight: brillo sutil bajo el cursor en elementos interactivos. En Electron: CSS radial-gradient en mousemove.

Depth

Layers at different Z-levels with Acrylic (frosted glass) and Mica (wallpaper-blended background) materials. For Shopilot: the glass-card pattern directly adopts this — backdrop-filter: blur() is Electron's Acrylic.Capas en diferentes niveles Z con materiales Acrílico (cristal esmerilado) y Mica. Para Shopilot: el patrón glass-card adopta esto — backdrop-filter: blur() es el Acrílico de Electron.

Motion

Connected animations — elements travel between states instead of disappearing and reappearing. Fluent easing: cubic-bezier(0.1, 0.9, 0.2, 1). Used by VS Code, Microsoft Edge, Teams.Animaciones conectadas — los elementos viajan entre estados en lugar de desaparecer y reaparecer. Easing Fluent: cubic-bezier(0.1, 0.9, 0.2, 1).

Material

Acrylic: backdrop-filter: blur(30px) saturate(180%) — used for sidebars, flyouts, menus. Mica: wallpaper color extracted and used as tint in app chrome. Both create sense of app being part of the OS.Acrílico: backdrop-filter: blur(30px) saturate(180%) — para sidebars, flyouts, menús. Mica: color del fondo del escritorio extraído como tinte en el chrome de la app.

Scale

Design for multiple device types. In Shopilot's context: design for minimum 900×600px window, scale gracefully to 2560×1440 (UltraWide). Touch targets minimum 44×44px even on desktop (for touch-screen Windows laptops).Diseñar para múltiples tipos de dispositivos. Contexto Shopilot: mínimo 900×600px, escalar a 2560×1440. Touch targets mínimo 44×44px incluso en desktop.

Fluent Typography — Segoe UI VariableTipografía Fluent — Segoe UI Variable

Windows 11 uses Segoe UI Variable — a variable font that covers all weights and optical sizes. On Windows, Electron apps that use Inter or system-ui automatically map to Segoe UI Variable. No action needed for the font on Windows builds.Windows 11 usa Segoe UI Variable — fuente variable que cubre todos los pesos. En Windows, apps Electron que usan Inter o system-ui mapean automáticamente a Segoe UI Variable.

Fluent Type Ramp (Windows 11):Escala tipográfica Fluent (Windows 11):

  • Caption · 12px · Regular
  • Body · 14px · Regular
  • Body Strong · 14px · Semibold
  • Subtitle · 20px · Semibold
  • Title · 28px · Semibold
  • Title Large · 40px · Semibold
  • Display · 68px · Semibold

Key difference vs Apple HIG:Diferencia clave vs Apple HIG:

Apple HIG uses 17pt as base body size (SF Pro at 17pt = Inter at ~14px). Fluent uses 14px body. On Windows, everything feels slightly larger. If you design for macOS at 13px body text, Windows will look right at 14px. Build token --body-size to switch per platform.Apple HIG usa 17pt como base (SF Pro 17pt = Inter ~14px). Fluent usa 14px body. En Windows, todo se ve ligeramente más grande. Construir el token --body-size para cambiar por plataforma.

How the Best Companies Use FigmaCómo Usan Figma las Mejores Empresas

figma.com · The industry standard for design — and how to use it efficientlyEl estándar de la industria para diseño — y cómo usarlo eficientemente

Figma is not a drawing tool — it's a design system management platform. Companies like Vercel, Linear, Airbnb, and Shopify use Figma as their source of truth for visual decisions, but NOT for everything. Understanding what they put in Figma vs what they build directly in code is what separates efficient teams from slow ones. Figma no es una herramienta de dibujo — es una plataforma de gestión de design systems. Empresas como Vercel, Linear, Airbnb y Shopify usan Figma como fuente de verdad para decisiones visuales, pero NO para todo.

The 5 ways top companies use FigmaLas 5 formas en que las mejores empresas usan Figma

01

Figma Variables = Design Tokens (the right way)Figma Variables = Design Tokens (la forma correcta)

Since Figma 2023, Variables replace Styles for colors, spacing, radii, and typography. Variables in Figma map 1:1 to CSS custom properties. The best companies (Vercel, Shopify, Atlassian) define their entire token system in Figma Variables, then export to JSON using the "Variables Import/Export" plugin (free). This JSON becomes the tokens.json that feeds Style Dictionary.Desde Figma 2023, Variables reemplaza Styles para colores, espaciado, radios y tipografía. Variables en Figma mapean 1:1 a propiedades CSS custom. Las mejores empresas definen su sistema de tokens en Figma Variables, luego exportan a JSON usando el plugin "Variables Import/Export". Este JSON se convierte en el tokens.json que alimenta Style Dictionary.

Figma Variable group → CSS output:Grupo de Variables Figma → output CSS:

color/brand/primary → --color-brand-primary: #F97316

spacing/4 → --spacing-4: 16px

radius/lg → --radius-lg: 8px

02

Auto Layout = Responsive Components that match CSS FlexboxAuto Layout = Componentes Responsivos que coinciden con CSS Flexbox

Figma's Auto Layout mirrors CSS Flexbox exactly. When a designer builds a button with Auto Layout (direction, gap, padding, alignment), it translates directly to a Tailwind class. This is how Linear, Vercel, and Shopify achieve zero friction between design and code: the designer thinks in flex terms, the developer writes flex terms.El Auto Layout de Figma refleja CSS Flexbox exactamente. Cuando un diseñador construye un botón con Auto Layout, se traduce directamente a una clase de Tailwind. Así Linear, Vercel y Shopify logran cero fricción entre diseño y código.

In Figma Auto Layout:En Figma Auto Layout:

Direction: Horizontal

Gap: 8px

Padding: 10px 16px

Align: Center

In Tailwind CSS:En Tailwind CSS:

flex

gap-2

px-4 py-2.5

items-center

03

Component Properties = Variant SystemComponent Properties = Sistema de Variantes

Top companies define every component with Properties (variant=primary/secondary/ghost, size=sm/md/lg, state=default/hover/disabled/loading). This creates a single source of truth for all component states. In Figma, you see all variants in one frame. In code, this maps to props. The designer and developer speak the same language.Las mejores empresas definen cada componente con Properties (variante=primary/secondary/ghost, tamaño=sm/md/lg, estado=default/hover/disabled/loading). Esto crea una fuente de verdad para todos los estados. El diseñador y el desarrollador hablan el mismo idioma.

Button component properties:Propiedades del componente Button:

variant: primary | secondary | ghost | danger | outline | link

size: sm | md | lg

state: default | hover | focus | disabled | loading

icon: none | left | right | only

04

Dev Mode = the handoff from designer to Claude CodeDev Mode = el handoff del diseñador a Claude Code

Figma Dev Mode (free for 1 viewer) lets developers inspect every design decision: exact pixel values, spacing, CSS properties, and exported assets. The workflow for Shopilot: designer finalizes a complex component in Figma → developer opens Dev Mode → copies the exact values into a prompt for Claude Code: "Build this component using these exact specs from Figma Dev Mode: [paste]." Claude Code generates the TypeScript in seconds.Figma Dev Mode permite a los desarrolladores inspeccionar cada decisión de diseño: valores exactos en píxeles, espaciado, propiedades CSS, y assets exportados. El flujo para Shopilot: diseñador finaliza componente → desarrollador abre Dev Mode → pega valores exactos en prompt para Claude Code.

The Claude Code + Figma prompt template:Template de prompt Claude Code + Figma:

"Build a React TypeScript component for [ComponentName]. Read the Figma component via Figma MCP for exact specs (dimensions, colors, spacing, states, variants). Use design tokens from globals.css. Include all states defined in the Figma component.""Construye un componente React TypeScript para [NombreComponente]. Lee el componente en Figma via Figma MCP para las specs exactas (dimensiones, colores, espaciado, estados, variantes). Usa los design tokens de globals.css. Incluye todos los estados definidos en el componente de Figma."

05

Figma as the single source of truth for all visual componentsFigma como fuente única de verdad para todos los componentes visuales

The Figma file (#18 Design System, core-product-design-system) follows Atomic Design (atoms, molecules, organisms, templates, pages) and is the single source of truth. Claude reads Figma via Figma MCP and implements matching React components in #1 Native Shell. No React components are created outside of what is defined in the Figma. The external design team maintains Figma; the engineering team consumes it.El archivo Figma (#18 Design System, core-product-design-system) sigue Atomic Design (átomos, moléculas, organismos, plantillas, páginas) y es la fuente única de verdad. Claude lee Figma via Figma MCP e implementa componentes React en #1 Native Shell. No se crean componentes React fuera de lo definido en el Figma. El equipo externo de diseño mantiene Figma; el equipo de ingeniería lo consume.

Figma Community Files — Download These TodayArchivos de Figma Community — Descargar Hoy

These are official or highly-used public Figma files from the reference companies. Duplicating them to your Figma account is free. Study how they structure components, Variables, and design systems — this is how the best companies work.Estos son archivos públicos de Figma oficiales o muy utilizados de las empresas de referencia. Duplicarlos a tu cuenta de Figma es gratuito. Estudia cómo estructuran componentes, Variables y design systems.

FileArchivo PublisherEditor What to studyQué estudiar Search in CommunityBuscar en Community
Apple Design ResourcesApple (official)macOS UI components, SF Symbols, HIG spacing"Apple Design Resources macOS"
Microsoft Fluent 2Microsoft (official)Fluent component library, Acrylic, tokens system"Microsoft Fluent 2 Web"
Vercel Design SystemVercel (official)Dark-first tokens, Geist font usage, Storybook link"Vercel Design"
Shadcn/ui Figma KitCommunity (official-ish)How shadcn components map to Figma — the bridge"shadcn ui"
Tailwind CSS UI KitCommunityTailwind spacing / color scales in Figma Variables"Tailwind CSS UI Kit"
Linear App DesignCommunity recreationDark sidebar, speed-first interactions, kbd badges"Linear design system"
Electron UI PatternsCommunityTitle bar, tray, window chrome patterns for Electron"Electron desktop UI"
Figma Variables StarterFigma (official)How to structure Variables for a design system"Variables starter kit Figma"

How to use these files:Cómo usar estos archivos: Don't copy components. Study structure. Look at: how they name Variables (tokens), how they organize component pages, how they document states, what their spacing system looks like. These are the patterns to replicate in Shopilot's Figma file when the brand is decided.No copiar componentes. Estudiar la estructura. Ver: cómo nombran Variables (tokens), cómo organizan páginas de componentes, cómo documentan estados, cómo se ve su sistema de espaciado. Estos son los patrones a replicar en el archivo Figma de Shopilot cuando la marca esté decidida.

Desktop App Visual References — What to EmulateReferencias Visuales de Apps Desktop — Qué Emular

These are the specific macOS Electron apps that Shopilot should study in detail as running software — not in Figma, but as installed apps. Each has a specific pattern Shopilot must adopt or consciously decide to deviate from.Estas son las apps Electron macOS específicas que Shopilot debe estudiar en detalle como software corriendo — no en Figma, sino como apps instaladas. Cada una tiene un patrón específico que Shopilot debe adoptar o decidir conscientemente desviarse.

Cursor — cursor.sh

MOST RELEVANT — study firstMÁS RELEVANTE — estudiar primero

The closest structural reference to Shopilot. Both are: Electron, AI-native, dark-first, split-pane (editor left + chat right). Download and install. Study: how the title bar works, how the chat panel opens/closes, how the AI response streams, how tool calls (terminal runs) are displayed, how the status bar at the bottom shows AI state. This is the gold standard for Shopilot's interaction model.La referencia estructural más cercana a Shopilot. Ambos son: Electron, AI-native, dark-first, split-pane. Descargar e instalar. Estudiar: cómo funciona la title bar, cómo abre/cierra el panel de chat, cómo hace streaming la respuesta AI, cómo se muestran las tool calls, cómo muestra el estado AI en el status bar. Este es el estándar de oro para el modelo de interacción de Shopilot.

Adopt from Cursor:Adoptar de Cursor:

  • Status bar 24px bottom
  • Streaming word-by-word
  • Tool call accordion
  • Thinking indicator

Adapt for Shopilot:Adaptar para Shopilot:

  • Split: code→marketplace
  • Tabs: files→marketplaces
  • Context: project→ASIN

Don't copy:No copiar:

  • Code editor UI
  • File tree sidebar
  • Diff view

Arc Browser — arc.net

The reference for rethinking desktop chrome. Arc proves that you can break HIG conventions (vertical tabs instead of horizontal, sidebar IS the app, no visible URL bar) and still feel native and premium. Study specifically: how Arc handles the title bar with traffic lights + drag region + custom controls in the same 40px zone. This is exactly what Shopilot's top bar needs to solve.La referencia para repensar el chrome de desktop. Arc prueba que puedes romper las convenciones HIG (tabs verticales, sidebar ES la app) y aún sentirte nativo y premium. Estudiar específicamente: cómo Arc maneja la title bar con traffic lights + drag region + controles custom en la misma zona de 40px. Esto es exactamente lo que necesita resolver el top bar de Shopilot.

Key lesson:Lección clave: Arc's sidebar gradient background (multi-color per space) is possible in Electron via CSS linear-gradient on the sidebar container. The space color customization is what makes Arc feel personal — a pattern Shopilot could adopt for marketplace color coding (Amazon=orange, MeLi=yellow, Shopify=green).El gradiente del sidebar de Arc es posible en Electron via CSS. La personalización de color por espacio hace que Arc se sienta personal — un patrón que Shopilot podría adoptar para codificación de colores por marketplace.

Linear — linear.app

The reference for performance as a design value. Every interaction in Linear is under 100ms. Study: the keyboard shortcut system (every action has a shortcut visible in the UI), the command palette (Cmd+K), the sidebar collapse behavior, and most importantly — how Linear handles empty states (no data = inspirational, not depressing). Also study: the data tables. Linear's issue list is the closest reference to Shopilot's ASIN product list.La referencia para el rendimiento como valor de diseño. Cada interacción en Linear es menor de 100ms. Estudiar: el sistema de atajos de teclado, la paleta de comandos (Cmd+K), el comportamiento de colapso del sidebar, los estados vacíos, y las tablas de datos — la lista de issues de Linear es la referencia más cercana a la lista de productos ASIN de Shopilot.

N

Notion — notion.so

The reference for Electron done right at scale (30M+ users). Study: how Notion handles window resizing (the sidebar collapses progressively), how they manage a complex sidebar with nested items without it feeling cluttered, and their hover-reveal interactions (properties appear on hover, not always). Also: Notion's dark mode implementation is one of the cleanest in any Electron app — study how they handle the transition between surface layers.La referencia para Electron bien hecho a escala (30M+ usuarios). Estudiar: cómo maneja el redimensionado de ventana (el sidebar colapsa progresivamente), el sidebar con items anidados sin sentirse abarrotado, interacciones hover-reveal, y la implementación del modo oscuro — una de las más limpias en cualquier app Electron.

</>

VS Code — code.visualstudio.com

THE Electron referenceLA referencia Electron

VS Code is the most used Electron app in the world with 30M+ daily active users. It is the definitive reference for what is possible technically and visually in Electron. Study: the status bar (bottom, 22px, same as Shopilot's 24px), the split pane system, the extension panel (same concept as Shopilot's sidebar), the command palette, and the theming system. VS Code themes are CSS token swaps — identical to what Shopilot's design token system will do. The VS Code GitHub repo is public — the theming architecture is directly applicable.VS Code es la app Electron más usada del mundo con 30M+ usuarios activos diarios. Es la referencia definitiva para lo que es posible en Electron. Estudiar: el status bar (inferior, 22px, similar a los 24px de Shopilot), el sistema de split pane, el panel de extensiones, la paleta de comandos, y el sistema de theming. Los temas de VS Code son intercambios de tokens CSS — idéntico a lo que hará el sistema de tokens de diseño de Shopilot.

Action: Install and study these 5 apps this weekAcción: Instalar y estudiar estas 5 apps esta semana

Cursor

cursor.sh

Arc

arc.net

Linear

linear.app

Notion

notion.so

VS Code

code.visualstudio.com

For each: spend 30 min using it normally, then 30 min inspecting specific patterns (title bar, sidebars, status bar, hover states, loading states, dark mode). Document what you want to adopt, adapt, or avoid. This is the most efficient design research you can do before the brand workshop.Para cada una: 30 min usándola normalmente, luego 30 min inspeccionando patrones específicos (title bar, sidebars, status bar, hover states, loading states, dark mode). Documentar qué adoptar, adaptar o evitar. Esta es la investigación de diseño más eficiente que se puede hacer antes del brand workshop.

Essential Figma Plugins for this WorkflowPlugins Esenciales de Figma para este Workflow

PluginPlugin What it doesQué hace PhaseFase CostCosto
Variables Import/ExportExports Figma Variables to JSON → feeds tokens.jsonPhase 2→3 bridgeFree
Tokens StudioFull design token management in Figma (W3C DTCG format)Phase 2→3 bridge$20/mo
ContrastWCAG AA/AAA contrast checker on any color pair in canvasPhase 2 · color decisionsFree
AbleAccessibility checker — contrast, focus order, WCAG annotationsPhase 4 · component reviewFree
IconifyAll Lucide icons available in Figma — same library as the codePhase 2+ ongoingFree
Figma to CodeExports Figma frames as HTML/Tailwind/React snippetsPhase 4 · component startFree
Color BlindSimulates 8 types of color blindness on any framePhase 2 · color decisionsFree
§14 · FULL-STACK

Full-Stack Design IntegrationIntegración Full-Stack de Diseño

The missing 30%: exact technology stacks, how everything wires together, Claude API integration patterns with real code, what's still undocumented, and 2026 AI-native design methodology. Actionable — not theoretical.El 30% que faltaba: stacks tecnológicos exactos, cómo todo se conecta, patrones de integración Claude API con código real, qué aún está sin documentar, y metodología de diseño AI-native 2026. Accionable — no teórico.

01 · The 6-Layer Stack — How Everything Connects01 · El Stack de 6 Capas — Cómo Todo Se Conecta

LAYER 6 · Quality Gates

Figma ↔ Code consistency review · axe-core a11y · Playwright e2e · PR blocked if component deviates from Figma

LAYER 5 · Claude AI Integration

Anthropic SDK v0.30+ · Messages streaming API · Tool use (36 tools) · Prompt caching · Multi-LLM router

LAYER 4 · Electron App Shell

Electron 33+ · WebContentsView (70%) · React 19 sidebar (30%) · IPC contextBridge · Auto-updater

LAYER 3 · React Component Library

shadcn/ui (Radix primitives) · Figma Atomic Design (#18) · Figma MCP · Tailwind 4 · Framer Motion 11

LAYER 2 · Design Token Pipeline

tokens.json (W3C DTCG) → Style Dictionary 4 → CSS custom properties → tailwind.config.ts → CSS vars

LAYER 1 · Design Spec (This File)

shopilot_v6.html · Single source of truth · Pablo approves · Sergio implements · Mateo owns tokens

Complete Package Manifest

PackageVersionPurposeLayerOwner
@anthropic-ai/sdk^0.30Claude API: streaming, tools, caching5Andrés
electron^33Desktop shell, WebContentsView, IPC4Mateo
react + react-dom^19UI renderer, concurrent features3Sergio
tailwindcss^4Utility CSS, token consumption3Sergio
@radix-ui/react-*latestAccessible primitives (via shadcn)3Sergio
shadcn/uiCLI 2.xComponent generator on Radix + Tailwind3Sergio
framer-motion^11Animations: word-stream, slide-up, spring3Sergio
lucide-react^0.43Icon library — 1.5px stroke, currentColor3Sergio
recharts^2Charts only (BSR sparkline, KPI gauge)3Andrés
style-dictionary^4Token transform: JSON → CSS → Tailwind2Mateo
@axe-core/react^4Accessibility audit (WCAG AA)6Sergio
zod^3Tool input/output validation schema5Andrés
zustand^5Agent state machine store3-5Sergio

02 · Design Token Pipeline — tokens.json → Production CSS02 · Pipeline de Tokens — tokens.json → CSS Producción

tokens.json

W3C DTCG format · source of truth

style-dictionary build

design-tokens.css + tailwind-tokens.ts

auto-generated, never edit manually

▶ tokens.json — Full Example (W3C DTCG format)▶ tokens.json — Ejemplo Completo (formato W3C DTCG)
{
  "$schema": "https://design-tokens.org/schema.json",
  "sp": {
    "color": {
      "bg": {
        "base": { "$value": "#0A0A0F", "$type": "color", "$description": "App background — near-black warm" },
        "01":   { "$value": "#0F0F18", "$type": "color" },
        "02":   { "$value": "#14141F", "$type": "color" },
        "03":   { "$value": "#1A1A28", "$type": "color" }
      },
      "orange": {
        "50":  { "$value": "rgba(249,115,22,0.08)", "$type": "color" },
        "500": { "$value": "#F97316", "$type": "color", "$description": "CANDIDATE — replace with decided brand color" },
        "600": { "$value": "#EA6005", "$type": "color" }
      },
      "fg": {
        "100": { "$value": "#F4F4F6", "$type": "color", "$description": "Primary text" },
        "80":  { "$value": "#D4D4E4", "$type": "color" },
        "60":  { "$value": "#A4A4B8", "$type": "color" },
        "40":  { "$value": "#7A7A90", "$type": "color" }
      },
      "success": { "$value": "#22C55E", "$type": "color" },
      "warning": { "$value": "#F59E0B", "$type": "color" },
      "error":   { "$value": "#EF4444", "$type": "color" },
      "info":    { "$value": "#3B82F6", "$type": "color" }
    },
    "space": {
      "g": { "$value": "10px", "$type": "dimension", "$description": "base grid unit" },
      "v": { "$value": "22px", "$type": "dimension", "$description": "vertical rhythm" },
      "4":  { "$value": "4px",  "$type": "dimension" },
      "8":  { "$value": "8px",  "$type": "dimension" },
      "12": { "$value": "12px", "$type": "dimension" },
      "16": { "$value": "16px", "$type": "dimension" },
      "24": { "$value": "24px", "$type": "dimension" },
      "32": { "$value": "32px", "$type": "dimension" }
    },
    "radius": {
      "sm":   { "$value": "4px",    "$type": "dimension" },
      "md":   { "$value": "6px",    "$type": "dimension" },
      "lg":   { "$value": "8px",    "$type": "dimension" },
      "xl":   { "$value": "12px",   "$type": "dimension" },
      "2xl":  { "$value": "16px",   "$type": "dimension" },
      "full": { "$value": "9999px", "$type": "dimension" }
    },
    "duration": {
      "instant": { "$value": "80ms",  "$type": "duration" },
      "fast":    { "$value": "150ms", "$type": "duration" },
      "normal":  { "$value": "200ms", "$type": "duration" },
      "slow":    { "$value": "350ms", "$type": "duration" },
      "scenic":  { "$value": "500ms", "$type": "duration" }
    }
  }
}
▶ style-dictionary.config.mjs — Build Config▶ style-dictionary.config.mjs — Configuración de Build
// style-dictionary.config.mjs
import StyleDictionary from 'style-dictionary';

export default {
  source: ['tokens.json'],
  platforms: {
    // → CSS custom properties (--sp-color-orange-500)
    css: {
      transformGroup: 'css',
      files: [{
        destination: 'src/styles/design-tokens.css',
        format: 'css/variables',
        options: { selector: ':root', outputReferences: true }
      }]
    },
    // → Tailwind config (for extend.colors, extend.spacing)
    tailwind: {
      transformGroup: 'js',
      files: [{
        destination: 'src/styles/tailwind-tokens.ts',
        format: 'javascript/esm'
      }]
    }
  }
}

// Run: npx style-dictionary build
// Output:
//   src/styles/design-tokens.css   ← import in main.tsx
//   src/styles/tailwind-tokens.ts  ← import in tailwind.config.ts
▶ tailwind.config.ts — Token Consumption▶ tailwind.config.ts — Consumo de Tokens
// tailwind.config.ts
import type { Config } from 'tailwindcss'

const config: Config = {
  content: ['./src/**/*.{ts,tsx}'],
  theme: {
    extend: {
      colors: {
        // Reference CSS custom properties so Tailwind + Style Dictionary stay in sync
        'sp-bg-base': 'var(--sp-color-bg-base)',
        'sp-orange':  'var(--sp-color-orange-500)',
        'sp-fg-100':  'var(--sp-color-fg-100)',
        'sp-success': 'var(--sp-color-success)',
        'sp-warning': 'var(--sp-color-warning)',
        'sp-error':   'var(--sp-color-error)',
      },
      spacing: {
        'sp-g': 'var(--sp-space-g)',   // 10px
        'sp-v': 'var(--sp-space-v)',   // 22px
      },
      borderRadius: {
        'sp-sm': 'var(--sp-radius-sm)',
        'sp-lg': 'var(--sp-radius-lg)',
        'sp-xl': 'var(--sp-radius-xl)',
      },
      fontFamily: {
        'display': ['Inter Display', 'Inter', 'sans-serif'],
        'mono':    ['JetBrains Mono', 'Fira Code', 'monospace'],
      },
      transitionDuration: {
        'sp-fast':   'var(--sp-duration-fast)',
        'sp-normal': 'var(--sp-duration-normal)',
        'sp-slow':   'var(--sp-duration-slow)',
      }
    }
  },
  plugins: []
}
export default config

03 · shadcn/ui Integration with Shopilot Tokens03 · Integración shadcn/ui con Tokens Shopilot

shadcn/ui is NOT a component library — it's a code generator. Components are copied into your repo and 100% customizable. Use it for accessibility-correct primitives, then override with Shopilot tokens.shadcn/ui NO es una librería — es un generador de código. Los componentes se copian a tu repo y son 100% personalizables. Úsalo para primitivas accesibles, luego sobrescribe con los tokens Shopilot.

▶ Setup Commands + globals.css Override▶ Comandos de Setup + Override globals.css
# 1. Init shadcn (say YES to CSS variables, pick Neutral base)
npx shadcn@latest init

# When prompted:
# ✓ Style: Default
# ✓ Base color: Neutral  (we override below)
# ✓ CSS variables: YES   (critical — this is how tokens flow in)
# ✓ src directory: YES

# 2. Add the components Shopilot needs (never add all at once)
npx shadcn@latest add button
npx shadcn@latest add dialog
npx shadcn@latest add dropdown-menu
npx shadcn@latest add tooltip
npx shadcn@latest add select
npx shadcn@latest add scroll-area
npx shadcn@latest add collapsible     # ← ToolAccordion base
npx shadcn@latest add badge
npx shadcn@latest add separator
npx shadcn@latest add progress        # ← ContextWindowBar

# 3. Override src/app/globals.css with Shopilot tokens:
@import 'design-tokens.css';          /* Style Dictionary output */

@layer base {
  :root {
    /* Map shadcn vars → Shopilot tokens */
    --background:       240 6% 7%;    /* #0A0A0F */
    --foreground:       240 6% 96%;   /* #F4F4F6 */
    --card:             240 6% 10%;   /* #14141F */
    --card-foreground:  240 6% 87%;   /* #D4D4E4 */
    --popover:          240 6% 10%;
    --popover-foreground: 240 6% 96%;
    --primary:          25 95% 53%;   /* CANDIDATE: #F97316 orange — replace once brand color decided */
    --primary-foreground: 0 0% 100%;
    --secondary:        240 4% 16%;   /* #28283C */
    --secondary-foreground: 240 6% 87%;
    --muted:            240 4% 16%;
    --muted-foreground: 240 6% 47%;   /* #7A7A90 */
    --accent:           25 95% 53%;   /* orange accent */
    --accent-foreground: 0 0% 100%;
    --destructive:      0 84% 60%;    /* #EF4444 */
    --border:           240 6% 20%;   /* rgba(255,255,255,.06) approx */
    --input:            240 6% 16%;
    --ring:             25 95% 53%;   /* orange focus ring */
    --radius:           0.5rem;       /* 8px = --sp-radius-lg */
  }
}

# Result: shadcn components automatically use Shopilot colors.
# Edit src/components/ui/button.tsx to change size tokens to sp-* vars.

Which shadcn components to use vs build customCuáles usar de shadcn vs construir custom

ComponentSourceWhy
Button (6 variants)shadcn base → customizeRadix provides correct focus/disabled states; we override styles
Dialog / Confirmation Cardshadcn Dialog → customizeRadix handles focus trap + aria-modal correctly; style from scratch
Tooltipshadcn Tooltip → light overridePositioning engine is complex; only needs color/font token override
Select / Dropdownshadcn → heavy customizeRadix handles keyboard nav; we rebuild visual completely
Tool AccordionBUILD CUSTOMStreaming state machine, badge states, JSON viewer — too specific
ReAct StreamBUILD CUSTOMWord-by-word animation, thinking pulse — unique to Shopilot
KPI CardBUILD CUSTOMJetBrains Mono + delta badge + sparkline — fully custom
Context Window Barshadcn Progress → customizeStacked segments on top of Progress primitive
Data Tableshadcn Table + TanStack TableTanStack handles sort/filter; shadcn provides base HTML table
Proactive Suggestion CardBUILD CUSTOMAnimated slide-up, dismiss swipe, max-2-simultaneous logic
Date Pickerreact-day-picker (NEVER BUILD)Calendar UI is complex; use library, override tokens only
Charts (sparkline, gauge)recharts (NEVER BUILD)Math-heavy; only override colors and font

04 · Claude API Streaming Integration — Real Implementation04 · Integración Claude API Streaming — Implementación Real

The complete chain from user input → Claude API → word-by-word UI animation → tool execution display. Every piece has a specific design pattern.La cadena completa desde input del usuario → Claude API → animación palabra-a-palabra → display de tool execution. Cada pieza tiene un patrón de diseño específico.

Agent State Machine

idle
user_typing
submitting
thinking ···
streaming ▊
tool_running
awaiting_confirm

done
|
error
|
credit_exhausted

thinking ···

CSS: opacity 0.4→1→0.4, 1.2s infinite · NO elapsed time shown · Status bar: animated dot

streaming ▊

Each word: fadeIn 80ms ease-out · Cursor: blinking 0.6s · NO skeleton, NO spinner

awaiting_confirm

Confirmation card slide-up 250ms spring · Input disabled · Backdrop dims 20%

▶ useStream.ts — Complete React Hook Implementation▶ useStream.ts — Implementación Completa del React Hook
// src/hooks/useStream.ts
import { useState, useCallback, useRef } from 'react';
import Anthropic from '@anthropic-ai/sdk';
import { shopilotTools } from '@/tools/definitions';
import { useAgentStore } from '@/stores/agentStore';

type AgentState =
  | 'idle' | 'thinking' | 'streaming'
  | 'tool_running' | 'awaiting_confirm' | 'done' | 'error';

interface StreamMessage {
  role: 'user' | 'assistant';
  content: string;
}

export function useStream() {
  const [agentState, setAgentState] = useState<AgentState>('idle');
  const [words, setWords] = useState<string[]>([]);
  const [currentToolCall, setCurrentToolCall] = useState<string | null>(null);
  const abortRef = useRef<AbortController | null>(null);
  const { addTool, updateTool } = useAgentStore();

  const stream = useCallback(async (messages: StreamMessage[]) => {
    abortRef.current = new AbortController();
    setWords([]);
    setAgentState('thinking');

    // NOTE: In Electron, Anthropic SDK runs in main process.
    // Renderer sends via IPC → main runs SDK → streams back via IPC.
    // This hook shows the renderer-side pattern.

    try {
      const client = new Anthropic(); // API key from env via contextBridge

      const stream = await client.messages.stream({
        model: 'claude-opus-4-6',
        max_tokens: 8192,
        system: SHOPILOT_SYSTEM_PROMPT,
        messages,
        tools: shopilotTools,
        // Prompt caching — reduces cost 60-80% on repeated context:
        betas: ['prompt-caching-2024-07-31'],
      });

      for await (const event of stream) {
        switch (event.type) {

          case 'content_block_start':
            if (event.content_block.type === 'text') {
              setAgentState('streaming');
            }
            if (event.content_block.type === 'tool_use') {
              setAgentState('tool_running');
              const toolId = event.content_block.id;
              const toolName = event.content_block.name;
              setCurrentToolCall(toolName);
              addTool({ id: toolId, name: toolName, state: 'running', startMs: Date.now() });
            }
            break;

          case 'content_block_delta':
            if (event.delta.type === 'text_delta') {
              // Word-by-word: split on spaces, animate each word
              const newWords = event.delta.text.split(/(?<=\s)/);
              setWords(prev => [...prev, ...newWords]);
            }
            break;

          case 'content_block_stop':
            setCurrentToolCall(null);
            break;

          case 'message_stop':
            setAgentState('done');
            break;
        }
      }
    } catch (err) {
      if ((err as Error).name !== 'AbortError') {
        setAgentState('error');
      }
    }
  }, [addTool]);

  const abort = useCallback(() => {
    abortRef.current?.abort();
    setAgentState('idle');
    setWords([]);
  }, []);

  return { agentState, words, currentToolCall, stream, abort };
}
▶ StreamingText.tsx — Word-by-Word Animation Component▶ StreamingText.tsx — Componente de Animación Palabra a Palabra
// src/components/StreamingText.tsx
import { motion, AnimatePresence } from 'framer-motion';

interface StreamingTextProps {
  words: string[];
  isStreaming: boolean;
}

// Design rule: each word fades in at 80ms.
// Cursor blinks at 0.6s cycle when streaming.
// No skeleton, no placeholder, no loading bar.
export function StreamingText({ words, isStreaming }: StreamingTextProps) {
  return (
    <div className="text-sp-fg-100 text-sm leading-relaxed">
      {words.map((word, i) => (
        <motion.span
          key={i}
          initial={{ opacity: 0 }}
          animate={{ opacity: 1 }}
          transition={{ duration: 0.08, ease: 'easeOut' }}  // 80ms per word
        >
          {word}
        </motion.span>
      ))}
      {/* Blinking cursor — only while streaming */}
      <AnimatePresence>
        {isStreaming && (
          <motion.span
            initial={{ opacity: 1 }}
            animate={{ opacity: [1, 0, 1] }}
            transition={{ duration: 0.6, repeat: Infinity, ease: 'linear' }}
            className="inline-block ml-0.5 font-mono text-sp-orange"
            style={{ fontFamily: 'JetBrains Mono' }}
          >
            ▊
          </motion.span>
        )}
      </AnimatePresence>
    </div>
  );
}

// ThinkingPulse — shown when agent is thinking (no tokens yet)
export function ThinkingPulse() {
  return (
    <motion.span
      animate={{ opacity: [0.4, 1, 0.4] }}
      transition={{ duration: 1.2, repeat: Infinity, ease: 'easeInOut' }}
      className="text-sp-fg-40 font-mono text-sm"
    >
      ···
    </motion.span>
  );
}

★ Prompt Caching — 60-80% Cost Reduction★ Prompt Caching — Reducción de Costo 60-80%

Mark static parts of context with cache_control: {type: 'ephemeral'} — system prompt + marketplace context + seller profile. TTL: 5 minutes. Every subsequent request in a session reuses cached tokens. At 1,000 sellers × 50 requests/day = $4,800/mo → $960/mo with caching.Marca las partes estáticas del contexto con cache_control: {type: 'ephemeral'} — system prompt + contexto marketplace + perfil del vendedor. TTL: 5 minutos. Cada request subsiguiente en sesión reutiliza tokens cacheados. A 1,000 vendedores × 50 requests/día = $4,800/mes → $960/mes con caching.

05 · Tool Call UI — Visual Patterns for 36 Tools05 · UI de Tool Calls — Patrones Visuales para 36 Tools

This was the biggest gap identified in the audit: the spec described the tool accordion but never showed the complete visual spec or component code. Fixed here.Este era el mayor gap identificado en el audit: el spec describía el tool accordion pero nunca mostraba el spec visual completo ni el código del componente. Corregido aquí.

Live Tool Accordion States

get_competitor_prices running · ASIN B08XYZABC
analyze_buy_box ✓ 847ms

Input

{ "asin": "B08XYZABC",
  "marketplace": "amazon_mx" }

Output

{ "buybox_winner": "us",
  "our_share": 0.78,
  "competitors": 3 }
update_product_price ⚠ IRREVERSIBLE
ASINB08XYZABC
Current price$24.99
New price$22.49
Projected impact+18% Buy Box probability
sync_inventory ✗ API timeout · retry?
▶ ToolAccordion.tsx — Complete Component▶ ToolAccordion.tsx — Componente Completo
// src/components/ToolAccordion.tsx
import { motion } from 'framer-motion';
import { Check, X, AlertTriangle, Loader2 } from 'lucide-react';

type ToolState = 'queued' | 'running' | 'success' | 'error' | 'awaiting_confirm';
type RiskLevel = 'read_only' | 'reversible' | 'irreversible';

interface ToolAccordionProps {
  id: string;
  name: string;
  state: ToolState;
  riskLevel: RiskLevel;
  durationMs?: number;
  input?: Record<string, unknown>;
  output?: Record<string, unknown>;
  errorMessage?: string;
  onConfirm?: () => void;
  onCancel?: () => void;
}

const stateConfig = {
  queued:           { icon: null, color: '#7A7A90', bg: 'rgba(122,122,144,0.06)', border: 'rgba(122,122,144,0.2)' },
  running:          { icon: 'spin', color: '#3B82F6', bg: 'rgba(59,130,246,0.05)', border: 'rgba(59,130,246,0.2)' },
  success:          { icon: 'check', color: '#22C55E', bg: 'rgba(34,197,94,0.05)', border: 'rgba(34,197,94,0.2)' },
  error:            { icon: 'x', color: '#EF4444', bg: 'rgba(239,68,68,0.05)', border: 'rgba(239,68,68,0.2)' },
  awaiting_confirm: { icon: 'warn', color: '#F59E0B', bg: 'rgba(245,158,11,0.05)', border: 'rgba(245,158,11,0.25)' },
};

export function ToolAccordion({ id, name, state, riskLevel, durationMs, input, output, errorMessage, onConfirm, onCancel }: ToolAccordionProps) {
  const cfg = stateConfig[state];
  const isDestructive = riskLevel === 'irreversible';

  return (
    <motion.div
      layout
      initial={{ opacity: 0, y: 4 }}
      animate={{ opacity: 1, y: 0 }}
      transition={{ duration: 0.2, ease: [0.16, 1, 0.3, 1] }}
      style={{ background: cfg.bg, border: `1px solid ${cfg.border}`, borderRadius: 10 }}
    >
      <details>
        <summary style={{ display: 'flex', alignItems: 'center', gap: 10, padding: '10px 16px', cursor: 'pointer', listStyle: 'none' }}>
          {/* State icon */}
          {state === 'running' && <Loader2 size={14} color={cfg.color} className="animate-spin" />}
          {state === 'success' && <Check size={14} color={cfg.color} strokeWidth={2.5} />}
          {state === 'error'   && <X size={14} color={cfg.color} strokeWidth={2.5} />}
          {state === 'awaiting_confirm' && <AlertTriangle size={14} color={cfg.color} />}

          <span style={{ fontSize: 12, fontWeight: 500, color: '#D4D4E4', flex: 1 }}>{name}</span>

          {/* Right badges */}
          {isDestructive && (
            <span style={{ fontSize: 9, fontWeight: 700, color: '#EF4444', textTransform: 'uppercase', letterSpacing: '0.1em' }}>
              IRREVERSIBLE
            </span>
          )}
          {state === 'success' && durationMs && (
            <span style={{ fontSize: 10, fontFamily: 'JetBrains Mono', color: cfg.color }}>
              ✓ {durationMs}ms
            </span>
          )}
          {state === 'error' && (
            <span style={{ fontSize: 10, fontFamily: 'JetBrains Mono', color: cfg.color }}>
              ✗ Error
            </span>
          )}
        </summary>

        {/* Expanded content */}
        <div style={{ padding: '0 16px 12px', borderTop: '1px solid rgba(255,255,255,0.05)' }}>
          {/* Confirmation card for irreversible actions */}
          {state === 'awaiting_confirm' && (
            <ConfirmationCard input={input} riskLevel={riskLevel} onConfirm={onConfirm} onCancel={onCancel} />
          )}
          {/* JSON viewer for success/error */}
          {(state === 'success' || state === 'error') && (
            <JsonViewer input={input} output={output} error={errorMessage} />
          )}
        </div>
      </details>
    </motion.div>
  );
}

06 · Previously Undocumented Patterns — Now Complete06 · Patrones Previamente Indocumentados — Ahora Completos

Empty States — 8 Variants

No ASINs Yet

First-run. CTA: "Add your first product"

No Search Results

Show query, suggest correction

All Caught Up

No pending actions. Positive reinforcement.

Sync Pending

Data loading from marketplace. Progress bar.

Not Connected

OAuth not done. CTA: "Connect marketplace"

No History

Audit log empty. "Actions will appear here"

Credits Zero

Agent paused. Upgrade CTA dominant.

No Reports

Pro feature gate. "Available in Pro plan"

Empty State Rules:

  • ① Icon: 32px, colored by context (orange=action, blue=info, green=success, red=error)
  • ② Title: max 4 words, sentence case, no period
  • ③ Description: 1 line, explains why + what to do next
  • ④ CTA: only if there's a direct action. Never show CTA on "All Caught Up"
  • ⑤ Never show empty state while loading — show progress instead

Error State Taxonomy — 3 Categories

Category A · RecoverableUser can fix

API timeout, validation error, missing field. Amber border + icon. Show specific message + retry button. Auto-retry after 3s with countdown.Timeout API, error de validación, campo faltante. Borde ámbar + ícono. Mensaje específico + botón retry. Auto-retry después de 3s con countdown.

Amazon API timeout · Retrying in 3s
Category B · UnrecoverableNeeds human intervention

Auth revoked, account suspended, critical DB error. Red banner. Explain what happened, what user must do. No auto-retry. Support link if relevant.Auth revocado, cuenta suspendida, error crítico de DB. Banner rojo. Explica qué pasó, qué debe hacer el usuario. Sin auto-retry. Link de soporte si es relevante.

MeLi credentials expired · Re-authentication required
Category C · Informational BlockNot an error, but blocked

Rate limit, credit exhausted, feature not in plan. Blue info banner. Calm tone. Clear path forward (upgrade, wait, etc). Agent pauses gracefully.Rate limit, créditos agotados, feature no en el plan. Banner azul informativo. Tono calmado. Camino claro hacia adelante (upgrade, esperar, etc). El agente pausa graciosamente.

Credits exhausted · Coach paused · Resets on March 6

Accessibility — WCAG AA Contrast Ratios

TextBackgroundRatioWCAG AAUse
#F4F4F6#0A0A0F15.8:1PASS AAAPrimary text on bg
#A4A4B8#0A0A0F7.1:1PASS AASecondary text
#F97316#0A0A0F5.8:1PASS AAOrange on bg
#FFFFFF#F973163.2:1PASS (large only)White on orange btn
#7A7A90#0A0A0F4.2:1PASS AATertiary text
#54546A#0A0A0F2.8:1FAIL — captions onlyPlaceholder, metadata (decorative)
#22C55E#0A0A0F7.0:1PASS AASuccess text
#EF4444#0A0A0F4.8:1PASS AAError text

⚠ #54546A fails WCAG AA — use only for decorative metadata (timestamps, IDs) where context is clear. Never for interactive or status-critical text.

08 · Development Methodology — How the 4-Person Team Ships08 · Metodología de Desarrollo — Cómo el Equipo de 4 Shippe

Phase 1 — Weeks 1-3

Design-in-Code · Ship Tokens + Atoms

  • → Mateo: tokens.json + Style Dictionary setup
  • → Sergio: Electron shell + React sidebar skeleton
  • → Sergio: shadcn/ui init + Button + Input + Badge
  • → Andrés: Anthropic SDK + IPC bridge + tool router
  • → Pablo: this spec + design review on each PR

Deliverable: Electron window opens · sidebar renders · "Hello Claude" works

Phase 2 — Weeks 4-6

AI Agent Loop · Core Organisms

  • → Sergio: StreamingText + ThinkingPulse + ToolAccordion
  • → Sergio: ConfirmationCard + RollbackPanel
  • → Andrés: 10 core tools (price read, competitor, buy box)
  • → Mateo: Figma MCP integration + token pipeline setup
  • → Pablo: design review of all organisms against Figma specs

Deliverable: Full coach loop working · tool calls visible · confirm/cancel works

Phase 3 — Weeks 7-10

Data + Quality Gates

  • → Andrés: DataTable + KPI cards + Context Window Bar
  • → Andrés: Audit log + proactive suggestion cards
  • → Mateo: axe-core a11y audit + Figma ↔ Code consistency gates
  • → Sergio: Empty states + error states all variants
  • → Pablo: Beta onboarding + first seller feedback

Deliverable: Beta-ready · full data views · quality gates passing

Design Review Checklist — Every UI PRChecklist de Design Review — Cada PR de UI

All numbers use JetBrains Mono (fontFamily: font-mono)
All colors use sp-* CSS vars or Tailwind sp-* classes
Interactive elements have focus ring (shadow-focus)
Bilingual: EN + ES spans on all user-visible text
No hardcoded colors — only var(--sp-*) or Tailwind tokens
Animations use sp-dur-* + sp-ease-* tokens
IRREVERSIBLE actions have red badge + 2-step confirm
Component matches Figma spec (#18 Design System)

Design System Maturity Score — Current State

Tokens defined
100%
Atoms built
0% (v1)
Molecules built
0% (v1)
Claude integration
0% (v1)
Spec coverage
98%
Brand defined
100%
2026 trends
100%

Next: Sergio starts Week 1 → tokens.json file + shadcn init + Button component. Mateo sets up Style Dictionary. The spec is ready. Now we build.

§14 · WORLD-CLASS

World-Class Design StrategyEstrategia de Diseño de Clase Mundial

The gap between "good design" and "world-class" is not more components — it's precision at the product level: how screens compose, how the competition fails, what makes sellers trust the UI instantly, and the 20 invisible decisions that separate tier-1 products.La diferencia entre "buen diseño" y "clase mundial" no es más componentes — es precisión a nivel de producto: cómo se componen las pantallas, cómo falla la competencia, qué hace que los vendedores confíen en la UI al instante, y las 20 decisiones invisibles que separan los productos de primer nivel.

01 · B2B Product UX References — Not Brand Books, Product Patterns01 · Referencias de UX de Producto B2B — No Brand Books, Patrones de Producto

These 8 products are referenced for their UX patterns — specific interaction and layout decisions Shopilot should adopt. Different from Section 15 which analyzed brand identity.Estos 8 productos se referencian por sus patrones de UX — decisiones específicas de interacción y layout que Shopilot debe adoptar. Diferente a la Sección 15 que analizó identidad de marca.

S

Stripe Dashboard

Gold standard for B2B SaaS data presentation

dashboard.stripe.com ↗

Metric Card Pattern

Big mono number (42px) → small label above → percentage delta below with color · No chart inside the card (chart is separate) · Hover reveals tooltip with exact timestamp

→ Shopilot: KPI cards follow this exact hierarchy

Activity Timeline

Every action logged with: type icon + description + amount + timestamp · Clickable row reveals full detail · Infinite scroll (no pagination) · Timeline = trust

→ Shopilot: Audit Log follows this pattern exactly

Developer-first but accessible

API keys visible in UI · Raw JSON expandable · But non-technical users see clean summaries · Same data, two perspectives on same screen

→ Shopilot: Tool accordion shows summary + expandable JSON

Linear

Keyboard-first B2B product · Speed as primary UX feature

linear.app ↗

Speed as Marketing

Linear measured and published their p50/p95 load times. "Built for speed" is a design statement. Every interaction under 100ms feels intentional. This is a UX strategy, not just engineering.

→ Shopilot: Measure + display model response time. Make it a feature.

Status as Color Only

No status text ("In Progress", "Done") on lists — just colored dots. Experts read the color map in <0.5s. Power users trained to read color grids in one glance.

→ Shopilot: Buy Box status = orange dot, not "You have buy box: YES"

Kbd Shortcut Badges Everywhere

Every action in dropdown shows keyboard shortcut. This teaches users → makes them faster → makes them dependent → reduces churn. Shortcut visibility = retention feature.

→ Shopilot: All dropdowns show Cmd+K, Cmd+1, Esc shortcuts inline

Figma

Complex tool with zero cognitive friction · Panel architecture

3-Panel Information Architecture

Left: navigation/layers · Center: work surface · Right: contextual properties. This is the master pattern for complex tools. The content always has max space. Panels are tools, not content.

→ Shopilot: Marketplace=center, Sidebar=right panel. Left nav deferred to v2.

Context-Sensitive Right Panel

The right panel changes based on what's selected. Select a component → see its properties. Click away → see general settings. Sidebar in Shopilot should adapt to the marketplace page being viewed.

→ Shopilot Phase 2: sidebar context = active ASIN on marketplace page

Multiplayer Visual Cues

Other users visible as colored cursors. Multi-tab shows who's looking at what. In Shopilot context: the AI "cursor" — the coach's attention indicator (which tool it's running, what data it's looking at right now).

→ Shopilot: "Coach is analyzing ASIN B08XYZ" status in sidebar header

Datadog

The benchmark for monitoring dashboards · Density without chaos

Time as Primary Axis

Every metric in Datadog is a time series. The X axis is always time. This trains users to think in trends, not point-in-time snapshots. For Shopilot: Buy Box % over 30d is more actionable than Buy Box % right now.

→ Shopilot: All KPIs have 7d/30d sparklines. Point values only for current.

Alert Integration in Charts

Threshold lines appear ON charts, not in separate alerts. When a metric crosses a line, the chart background changes color. Alert IS the chart. No separate notification panel for threshold breaches.

→ Shopilot: Price threshold line on competitor chart. Red zone when below margin.

Faceted Filtering

Left sidebar has real-time faceted filters that update counts as you click. Tags/dimensions are first-class citizens. For sellers: filter by marketplace + category + status simultaneously. Update counts in real-time.

→ Shopilot Phase 2: ASIN table with faceted filters (marketplace, category, status)

Arc Browser

The best Electron app built — breaks every browser convention and wins

Sidebar IS the App Chrome

Arc moved ALL chrome (tabs, bookmarks, history) to the left sidebar. The content area is 100% undecorated. This is the insight: in Electron, the sidebar is where your app lives. The WebContentsView is sacred space.

→ Shopilot: sidebar has zero visual decoration except the chat + tool calls + status

Custom Title Bar Done Right

Arc's frameless window with custom controls that feel MORE native than native. The traffic light buttons are in their correct position, drag region is the entire top bar, full-screen transitions are perfect.

→ Shopilot: frameless + native traffic lights + 32px drag region + tab bar after

Command Bar as Primary Navigation

Arc's Cmd+T opens a search-everything command bar. This is the #1 power user feature. Arc trained millions of users to navigate entirely by keyboard. Once users find the command bar, they never use menus again.

→ Shopilot: Cmd+K opens command palette: "analyze B08XYZ", "reprice all", "show alerts"

BBG

Bloomberg Terminal

The extreme end of data density done right · Reference for seller data density

Density as Expertise Signal

Bloomberg is deliberately dense. It signals: "this is for professionals." The density IS the marketing — it makes users feel expert just by using it. Shopilot sellers are professionals. They can handle density. Don't dumb it down.

→ Shopilot: Don't simplify competitor tables. Show all 8 columns. Professionals want data.

Color = Directionality Only

Bloomberg uses green/red ONLY for up/down price movement. No other meaning. Nothing else is green or red. This absolute discipline means users process market data at a glance without thinking about color meaning.

→ Shopilot: #22C55E = price up / won Buy Box. #EF4444 = price down / lost Buy Box. Nothing else.

Monospace as Alignment

All Bloomberg data is monospace because financial data must align vertically. The $1,234.56 must be perfectly below $98.76 and $12,300.00. Misalignment breaks scanning. Monospace is structural, not decorative.

→ Shopilot: JetBrains Mono for all numbers is Bloomberg discipline applied to e-commerce.

N

Notion

Progressive disclosure master · Slash commands as interaction metaphor

Slash Command = AI Interaction

Notion's "/" opens inline commands. Claude Code uses the same pattern. This is now the universal AI interaction metaphor. Shopilot's chat input should support "/" for quick actions: "/reprice", "/analyze", "/report".

→ Shopilot: "/" in chat input opens quick-action palette with 36 tools

Properties Reveal on Hover

Notion rows show only essential data by default. Hover reveals additional properties. This keeps lists clean while preserving data access. For Shopilot: ASIN rows show Name + Price + Buy Box. Hover reveals: SKU, inventory, last sync.

→ Shopilot: ASIN row hover reveals secondary metrics (expandable hover card)

Everything is a Block

Notion's single abstraction ("a block") unifies all content types. For Shopilot: every item in the sidebar is "a message" — user message, assistant message, tool call, confirmation card, proactive suggestion. Same base type, different renders.

→ Shopilot: MessageBlock type with discriminated union: text | tool | confirm | proactive

Intercom

Fin AI + human handoff · The original AI product with trust signals

AI vs Human Indicator

Intercom shows whether Fin AI or a human is responding. The AI has a bot icon; human has a photo. For Shopilot: the coach always shows "Powered by Claude Opus 4.6" + current model. Users trust labeled AI more than unlabeled AI.

→ Shopilot: sidebar header shows model name + version. Always visible, never hidden.

Proactive + Reactive in Same UI

Intercom shows proactive campaigns AND reactive inbox in same interface. Two modes: outbound (AI initiates) and inbound (user initiates). For Shopilot: the coach can initiate conversations ("I noticed X") and respond to queries.

→ Shopilot: proactive suggestion cards (coach-initiated) + chat input (user-initiated) in same sidebar

Context Panel Always Visible

Intercom inbox shows customer context alongside every conversation — purchase history, previous tickets, plan level. The agent never has to "look it up." For Shopilot: the coach always has seller profile + marketplace data visible in context.

→ Shopilot: context bar top of sidebar shows active marketplace + seller plan + top ASIN count

02 · Screen Compositions — What Each Main Screen Actually Looks Like02 · Composiciones de Pantalla — Cómo se Ven Realmente las Pantallas Principales

The biggest gap in the spec before this section. Components are defined; screens are not. These CSS mockups show exact proportions, component placement, and information hierarchy.El mayor gap del spec antes de esta sección. Los componentes están definidos; las pantallas no. Estos mockups CSS muestran proporciones exactas, ubicación de componentes y jerarquía de información.

Screen 01 · Coach View — Main Application Screen (70/30)

Amazon
MeLi
Shopify
sellercentral.amazon.com/inventory
Today's Sales
$3,847
Buy Box %
78%
Active ASINs
47
Alerts
3
ASIN / Product
Price
Buy Box
BSR
Stock
Wireless Headphones Pro
B08XYZABC1
$24.99
#1,234
23
USB-C Cable 6ft
B07ABCDEF2
$12.49
#4,891
147
Phone Stand Adjustable
B09GHIJKL3
$18.99
#2,102
312
Shopilot Coach
claude-opus-4-6
Context window
You
Why did I lose the buy box on B07ABCDEF2?
get_competitor_prices ✓ 312ms
Shopilot
You lost the buy box because TechDeals_MX repriced to $11.49$1.00 below your price. They have 4.8★ vs your 4.6★.
Suggested action
Reprice to $11.29 → projected Buy Box recovery: 73%
Ask your coach... ⌘K
Amazon connected
247 credits

Title bar

32px · frameless · traffic lights · tab bar after buttons · drag region

Marketplace 70%

WebContentsView · URL bar 28px · content scrolls natively · no interference

Sidebar 30%

React · header 36px · context bar · chat scroll · input sticky · status 20px

Status bar

20px · left: marketplace status · right: credit balance (JetBrains Mono)

Screen 02 · Dashboard View — Sidebar in "Overview" Mode

Shopilot Coach
03/05 · 14:32
Revenue 7d
$24.8K
▲ +12%
Buy Box
78%
▼ -4%
Alerts
3
review now
Top Opportunities
USB-C Cable 6ft lost Buy Box
-$1.00 vs TechDeals_MX
Phone Stand inventory critical
23 units · ~4 days
3 ASINs underpriced vs market
+$180/mo potential
Ask your coach anything... ⌘K

Dashboard mode: sidebar replaces chat history with KPI summary + opportunity list when agent is idle. Chat input always present. Click any opportunity → coach activates and analyzes it.

03 · Competitive Design Matrix — Why Shopilot Looks Different03 · Matriz Competitiva de Diseño — Por Qué Shopilot Se Ve Diferente

The existing seller tools (Helium 10, SellerBoard, Jungle Scout, Repricer.com) were designed in 2012-2018. They solve the right problems with completely wrong design language for 2026. This is Shopilot's visual competitive moat.Las herramientas actuales para vendedores fueron diseñadas en 2012-2018. Resuelven los problemas correctos con un lenguaje de diseño completamente equivocado para 2026. Esta es la ventaja competitiva visual de Shopilot.

Dimension Helium 10 SellerBoard Jungle Scout Repricer.com Shopilot ★
Design Era 2018 · SaaS purple 2015 · Excel aesthetic 2017 · Consumer green 2013 · Corporate blue 2026 · AI-native dark
Primary BG #6B4FBB purple #FFF white #1D6F42 green #1B4F8A navy #0A0A0F near-black
AI Integration Bolt-on chatbot (2024) None AI keywords only Rule-based only AI-first · agent loop · 36 tools
Number Display Default browser font Arial/Helvetica Proxima Nova regular System serif JetBrains Mono always
Dark Mode ✗ Light only ✗ Light only ⚠ Toggle (half done) ✗ Light only ✓ Dark-first · identity
Desktop App ✗ Web only ✗ Web only ✗ Web only ✗ Web only ✓ Electron · native feel
Reversibility ✗ Not labeled ✗ Not labeled ✗ Not labeled ⚠ Confirm dialog only ✓ REVERSIBLE/IRREVERSIBLE · rollback tokens
Typography system 1-2 fonts, no scale System fonts Proxima Nova only System fonts Inter Display + JetBrains Mono · full scale
Context awareness ✗ Manual switch ✗ Manual switch ✗ Manual switch ✗ Manual switch ✓ Coach sees active marketplace page
Perceived quality Tool (functional) Spreadsheet Consumer app Legacy SaaS Precision instrument · Bloomberg meets Claude

★ The Core Design Insight★ El Insight Central de Diseño

Every competitor was designed by engineers for engineers. Shopilot is designed by a seller who has used all of these tools and knows exactly what they get wrong. The dark + professional + monospace + AI-native aesthetic isn't a trend — it's the natural design language of a serious professional tool for 2026. This is the same design evolution that happened in finance (Bloomberg → Robinhood), in code (Eclipse → VS Code → Cursor), and in project management (JIRA → Linear).Cada competidor fue diseñado por ingenieros para ingenieros. Shopilot es diseñado por un vendedor que ha usado todas estas herramientas y sabe exactamente qué hacen mal. La estética dark + profesional + monospace + AI-native no es una tendencia — es el lenguaje de diseño natural de una herramienta profesional seria para 2026.

04 · Emotional Design Map — From First Install to Power User04 · Mapa de Diseño Emocional — Del Primer Install al Power User

0s · First Impression

"This looks serious"

Dark canvas opens. Orange accent. Shopilot logo. No splash screen, no loading animation. App IS the window.

Design: near-black bg · frameless · logo mark visible · zero clutter

30s · Onboarding

"This is fast"

5-step wizard. Step 1: value prop. Step 2: OAuth in 30s. Step 3: language/category. Skip from step 3.

Design: progress dots · one action per step · CTA dominant · NO form fields until step 3

2min · First Tool Call

"The AI knows my data"

Coach runs first analysis unprompted. Tool accordion shows real API calls to their real store. This is the trust moment.

Design: tool accordion opens · real ASIN names · JetBrains Mono numbers · "From Amazon API"

5min · Aha Moment

"I didn't know this"

Coach surfaces an insight the seller didn't have: "You lost Buy Box on 8 ASINs in the last 24h. Here's why." This is the aha moment.

Design: proactive card slides up · specific numbers · one-click action · orange CTA

Day 1 · First Win

"It actually worked"

Price was changed. Buy Box % goes up. Confirmation with actual before/after. The coach says "Buy Box recovered to 91%."

Design: success state · green + orange celebrate · audit log entry · rollback still visible

Week 1 · Habit

"I check this every morning"

Dashboard view shows overnight changes. 3 opportunities queued. Seller opens app and acts on them before coffee is done.

Design: dashboard mode · opportunities sorted by $$$ impact · 1-click actions · <60s daily ritual

Month 1 · Expert

"I can't operate without this"

Power user. Knows Cmd+K, "/" commands. Audit log is their source of truth. Coach Memory has learned their preferences.

Design: keyboard shortcuts visible · command palette muscle memory · history as data

Designed Delight Moments — The Details That StickMomentos de Deleite Diseñados — Los Detalles que Se Quedan

First Buy Box Win Celebration

When buy box goes from ✗ to ✓, the status dot pulses green 3x with scale(1.4). Subtle. Not a confetti explosion. Professional delight.

Typing Indicator Before Coach Responds

The ··· thinking pulse with "Shopilot is analyzing your store" appears immediately when user sends message. Never a blank moment.

Rollback Success State

When rollback completes, the audit log entry shows "↩ Reversed · 2.3s ago" in green. The system communicates "you're safe, it worked."

Coach Memory Acknowledgment

When coach uses seller's stored preference, it says "(using your saved preference: always protect margins >30%)". Shows it's paying attention.

Competitor Detected Alert

When a new seller lists on one of your ASINs, the proactive card appears with their name, price, and rating. Feels like having eyes everywhere.

Credit Milestone

When seller uses their 100th credit, a discreet banner: "100 actions taken · Avg response: 1.2s · $847 in revenue impacts attributed." Numbers build pride.

05 · E-Commerce Domain Visual Patterns — What No Other Design System Has05 · Patrones Visuales Específicos de E-Commerce — Lo que Ningún Otro Design System Tiene

Generic design systems cover buttons and inputs. Shopilot needs patterns specific to e-commerce seller intelligence. These are the domain-specific visual components that make the product feel built BY a seller.Los design systems genéricos cubren botones e inputs. Shopilot necesita patrones específicos de inteligencia de vendedores e-commerce. Estos son los componentes visuales específicos del dominio que hacen que el producto se sienta construido POR un vendedor.

Buy Box Indicator — 4 States

78% You own it
0% Lost · Fix now
34% Contested
No data yet

Rule: Buy Box % is ALWAYS JetBrains Mono. Color = status only. No text labels on list view (dot only). Labels on detail view.

Price Delta Display — Competitor Comparison

$24.99 You
+$2.50 ▲
$22.49 TechDeals_MX
★ Winner
$25.99 ElectroMX
-$1.00 ▼

You row = orange bar. Winner row = highlighted. Relative bar shows price position visually. Delta shown as absolute + direction. Never percentage-only.

BSR Trend Sparkline — Inline in Table

#1,234
▲ improving
#4,891
▼ declining

BSR: LOWER = BETTER (rank #1 = bestseller). Sparkline: green slope = improving (going toward #1). ALWAYS show direction word, not just number. Color shadow band adds weight without legend.

Inventory Health Grid — Portfolio View

312
days
147
days
23
days
4
days
89
days
31
days
201
days
2
days

Each cell = one ASIN. Color = stock health (green >60d / amber 15-60d / red <15d). Number = days remaining. Glanceable portfolio status. No labels needed — color + number is sufficient.

06 · Color Blindness Safety — Accessible for All Sellers06 · Seguridad para Daltonismo — Accesible para Todos los Vendedores

~8% of men and ~0.5% of women have red-green color blindness. For Shopilot, this means Buy Box won (green) vs lost (red) may be indistinguishable to ~1 in 12 male sellers. The fix: never use color alone for meaning. Always pair with icon, text, or shape.~8% de hombres y ~0.5% de mujeres tienen daltonismo rojo-verde. Para Shopilot esto significa que Buy Box ganado (verde) vs perdido (rojo) puede ser indistinguible para ~1 de cada 12 vendedores hombres. La solución: nunca usar solo el color para transmitir significado.

Deuteranopia (Red-Green Blind)

Most common: green-blind. Reds appear brownish-yellow. Greens appear similar to orange.

Buy Box Won
Simulated
Buy Box Lost
Simulated

Problem: green and red dots look identical to deuteranopes. Users can't distinguish Buy Box won vs lost by color alone.

Fix: Shape + Color (WCAG 1.4.1)

Never use color alone. Always pair color with shape, icon, or text pattern.

78% Won — checkmark confirms
0% Lost — X confirms
34% Contested — dash confirms

Solution: ✓/✗/— icons work even without color. Color still helps non-colorblind users scan faster.

Safe Color Pairs (Accessible)

These color combinations are distinguishable under all common color blindness types:

Blue + Orange — universally distinguishable
Blue + Amber — excellent for status pairs
White + Dark — table row contrast (no color needed)

Testing tools: Figma "Color Blind" plugin · Chrome DevTools accessibility panel · coblis.de online simulator

07 · The 20 Invisible Decisions That Make Products World-Class07 · Las 20 Decisiones Invisibles que Hacen los Productos de Clase Mundial

Users can't name these details. But they feel them. A user who says "this just feels premium" is responding to some combination of these 20 decisions. None of them take more than a few hours to implement. All of them matter.Los usuarios no pueden nombrar estos detalles. Pero los sienten. Un usuario que dice "esto simplemente se siente premium" está respondiendo a alguna combinación de estas 20 decisiones. Ninguna toma más de pocas horas de implementar. Todas importan.

① Letter-spacing on headings

-0.03em on h2 makes text look designed, not default. Default tracking = amateur.

② Consistent 4px grid

Every spacing value divisible by 4. Not "16px here, 18px there." Inconsistency is invisible but users sense the chaos as "roughness."

③ Inset shadow on cards

inset 0 1px 0 rgba(255,255,255,.06) adds glass depth. Without it, dark cards look flat and dead.

④ Transition on color changes

transition: background 150ms ease, color 150ms ease on all interactive elements. Instant color changes feel abrupt and cheap.

⑤ Border on focus, not outline

Browser default outline is ugly. Replace with box-shadow: 0 0 0 2px rgba(249,115,22,.5). Same a11y benefit, premium look.

⑥ Disabled ≠ invisible

Disabled elements at 50% opacity tell users "this exists but you can't use it yet." Not display:none. Visibility + opacity = correct pattern.

⑦ Line-height on body text = 1.5

Dense data UIs are tempting to set to 1.2. Don't. AI-generated text needs 1.5 minimum for readability. Chat messages need 1.6.

⑧ Cursor: pointer on interactive divs

If it's clickable, it needs cursor: pointer. Forgetting this on tool accordions or proactive cards breaks the interaction expectation.

⑨ Tabular nums on ALL numbers

font-variant-numeric: tabular-nums makes numbers align in columns. Without it, a table of prices is unreadable.

⑩ Scrollbars styled or hidden

Default scrollbars look terrible on dark UIs. Either hide with ::-webkit-scrollbar or make them thin + dark. Visible ugly scrollbars = unfinished product.

⑪ No horizontal scroll on mobile

Electron windows can be resized smaller than expected. overflow-x:hidden on body, overflow-x:auto on tables only.

⑫ Semantic HTML elements

Use <button> not <div onclick>. <time> for timestamps. <output> for live AI output. Semantic = better a11y + better dev experience.

⑬ Will-change on animated elements

will-change: transform, opacity on sliding cards and streaming text. Moves animation to GPU. Eliminates jank at 60fps.

⑭ Error messages explain what to DO

"Error 403" = terrible. "Your Amazon credentials expired. Click Reconnect to re-authorize in 30 seconds." = world-class. Every error has a next step.

⑮ Timestamps in user timezone

Never show UTC. Use Intl.DateTimeFormat(locale, {timeZone}). "2:34 PM" not "19:34 UTC". Sellers check timestamps constantly.

⑯ Number formatting by locale

MeLi sellers in Mexico: $1,847.50 not 1847,50 MXN. Use Intl.NumberFormat. Wrong number format breaks trust immediately.

⑰ Empty inputs have placeholder text

Chat input: "Ask your coach about any ASIN, competitor, or pricing decision..." Not "Type here" or blank. Placeholder teaches the product's power.

⑱ Correct text cursor in inputs

Input fields: cursor: text. Buttons: cursor: pointer. Disabled: cursor: not-allowed. Every cursor state must be right.

⑲ Data source attribution

Below every KPI: "From Amazon Seller Central API · Synced 4 min ago" in 10px gray. This is the invisible trust builder. Users who see source attribution trust the numbers more.

⑳ Reduce motion for vestibular

@media (prefers-reduced-motion: reduce) { * { animation-duration: 0.01ms; } } — respects OS accessibility settings. Required for WCAG 2.3.3.

18

Production Readiness — Critical Gaps Listo para Producción — Brechas Críticas

30-point audit results · 14 gaps identified · All HIGH/MEDIUM severity specs Resultados de auditoría 30 puntos · 14 brechas identificadas · Specs severidad ALTA/MEDIA

4
HIGH — Missing
Updates · Persistence · GDPR · Observability
6
MEDIUM — Missing
Support · Demo · Multi-acct · Desktop OS
4
PARTIAL — Incomplete
E2E tests · Virtualization · Tray · Menus

This section was generated from a systematic 30-point codebase audit. Each sub-section contains actionable implementation specs. Address HIGH items before public beta. MEDIUM items before v1.0 GA. Esta sección fue generada a partir de una auditoría sistemática de 30 puntos. Cada sub-sección contiene specs de implementación accionables. Resolver ítems HIGH antes del beta público. MEDIUM antes de v1.0 GA.

HIGH 01 · Update Notification System 01 · Sistema de Notificación de Actualizaciones

MISSING

electron-updater is configured for auto-download but the user-facing update experience is completely unspecified. Silent updates break trust — users need to know when and why the app changed. electron-updater está configurado para auto-descarga pero la experiencia de actualización para el usuario no está especificada. Las actualizaciones silenciosas rompen la confianza.

Update State Machine

idle checking available downloading ready restarting

Update Available Modal — Live Spec

Shopilot 1.3.0 available

You have version 1.2.4. Download is ready.

What's new

  • Coach: 3x faster tool execution with parallel calls
  • MercadoLibre: new competitor tracking for MX sellers
  • Fixed: Rollback confirmation not dismissing on success
  • Fixed: Credit balance not updating after top-up
Downloading update… 73%
Implementation — main process (click to expand)
// main/updater.ts
import { autoUpdater } from 'electron-updater';
import { BrowserWindow, dialog } from 'electron';

autoUpdater.autoDownload = true;
autoUpdater.autoInstallOnAppQuit = true;

autoUpdater.on('update-available', (info) => {
  mainWindow.webContents.send('update:available', {
    version: info.version,
    releaseNotes: info.releaseNotes,
  });
});

autoUpdater.on('download-progress', (progress) => {
  mainWindow.webContents.send('update:progress', {
    percent: Math.round(progress.percent),
    bytesPerSecond: progress.bytesPerSecond,
  });
});

autoUpdater.on('update-downloaded', () => {
  mainWindow.webContents.send('update:ready');
});

// IPC handler — user clicks "Restart & Update"
ipcMain.handle('update:install', () => {
  autoUpdater.quitAndInstall(false, true); // isSilent=false, forceRunAfter=true
});

// Check interval: on launch + every 4 hours
autoUpdater.checkForUpdatesAndNotify();
setInterval(() => autoUpdater.checkForUpdatesAndNotify(), 4 * 60 * 60 * 1000);
State UI Pattern Dismissible?
checkingStatus bar dot pulses blue — silentAuto
availableIn-app banner: "New version available. View details"Yes (persists until restart)
downloadingModal with changelog + progress bar (auto-shown)Yes (download continues)
readyModal: "Ready to install. Restart now?" with changelogYes (installs on quit)
errorSilent (logged to Sentry) — do not bother user for update errorsN/A

HIGH 02 · Local Chat Persistence 02 · Persistencia Local del Chat

MISSING

Chat sessions vanish on app restart. No localStorage, no IndexedDB, no Zustand persist spec exists anywhere in the codebase. Sellers who close the app lose all context — a critical trust failure. Las sesiones de chat desaparecen al reiniciar la app. No hay spec de localStorage, IndexedDB, ni Zustand persist en todo el codebase. Los sellers que cierran la app pierden todo el contexto.

Data Model — What to Persist

ChatSession (IndexedDB — shopilot-chat store)

interface ChatSession {
  id: string;           // uuid
  marketplaceId: 'amazon' | 'meli' | 'shopify';
  asin?: string;        // active context when session started
  messages: Message[];  // all messages including tool calls
  createdAt: number;    // unix ms
  updatedAt: number;
  tokenCount: number;   // for context window visualization
  title?: string;       // auto-generated from first user message (truncated 60 chars)
}

Zustand Store — React State Layer

import { create } from 'zustand';
import { persist, createJSONStorage } from 'zustand/middleware';

// Lightweight: only persist session index (not full messages)
// Full messages go to IndexedDB via idb-keyval
const useChatStore = create(persist(
  (set, get) => ({
    sessions: [] as SessionMeta[],      // { id, title, updatedAt, marketplace }
    activeSessionId: null as string | null,
    setActiveSession: (id: string) => set({ activeSessionId: id }),
    addSession: (meta: SessionMeta) =>
      set(s => ({ sessions: [meta, ...s.sessions].slice(0, 100) })), // keep last 100
  }),
  {
    name: 'shopilot-chat-store',
    storage: createJSONStorage(() => localStorage), // session index only
  }
));

// Full messages: idb-keyval (no serialization overhead)
import { get as idbGet, set as idbSet, del as idbDel } from 'idb-keyval';

export const loadSession = (id: string) => idbGet<ChatSession>(`session:${id}`);
export const saveSession = (s: ChatSession) => idbSet(`session:${s.id}`, s);
export const deleteSession = (id: string) => idbDel(`session:${id}`);
Storage What's stored Retention Size limit
localStorageSession index (id, title, timestamp)100 sessions~20 KB
IndexedDBFull message arrays with tool calls90 days, then pruned~50 MB soft cap
safeStorageAPI keys, marketplace credentialsUntil user logoutNegligible
SQLite (main)Audit log, price history, snapshots180 days500 MB max

Session History UI — Sidebar Panel

When chat input is empty: show last 5 sessions as clickable cards below input. Each card: title (auto) + marketplace icon + relative time. Clicking loads the session and resumes context. Pattern adopted from Claude.ai sidebar.

HIGH 03 · GDPR, Data Export & Account Deletion 03 · GDPR, Exportación de Datos y Eliminación de Cuenta

MISSING

Zero documentation of user data download, account deletion, or data retention. Required by GDPR (EU), LGPD (Brazil — critical for MeLi sellers), and expected by Apple App Store Review. Must exist before any public release. Sin documentación de descarga de datos, eliminación de cuenta o retención. Requerido por GDPR (UE), LGPD (Brasil — crítico para sellers de MeLi), y App Store Review. Debe existir antes de cualquier lanzamiento público.

Personal Data Inventory (PII Map)

Data Type Where stored Purpose Retention Exportable?
Email addressSupabase auth.usersAccount identityUntil deletionYes
Marketplace credentialsElectron safeStorage (local)API accessUntil revokeNo (keys)
Chat historyLocal IndexedDBSession continuity90 daysYes (JSON)
Audit logLocal SQLiteRollback & trust180 daysYes (CSV)
Usage telemetryPostHog (cloud)Product analytics24 monthsOn request
Credit transactionsSupabase billingBilling history7 years (legal)Yes (PDF)
Error/crash reportsSentry (cloud)Bug fixing90 daysNo (aggregate)

Data Export Package — ZIP Structure

shopilot-export-{userId}-{YYYYMMDD}.zip
├── README.txt                   # What's in this export, data policy link
├── account/
│   ├── profile.json             # email, plan, created_at, last_login
│   └── billing_history.csv      # date, amount, credits, description
├── chat_history/
│   ├── sessions_index.json      # session metadata (title, date, marketplace)
│   └── session_{id}.json × N    # full message arrays per session
├── audit_log/
│   └── actions.csv              # timestamp, action, asin, old_value, new_value, reversible
└── telemetry_summary.json       # aggregate usage stats (no PII included)

Account Deletion Flow (GDPR Article 17 — Right to Erasure)

  1. User navigates to Settings → Account → "Delete Account"
  2. Modal: "This will permanently delete your account and all data. Export your data first?" with [Export Data] + [Continue to Delete] buttons
  3. Type "DELETE" in text field to confirm (same pattern as Vercel, Supabase)
  4. Server-side: mark account deleted_at → Supabase Edge Function queues hard delete in 30 days (grace period for disputes)
  5. Local: clear all IndexedDB stores + localStorage + SQLite + safeStorage keys on next launch
  6. Confirmation email: "Your Shopilot account will be permanently deleted on {date+30d}. Cancel: {link}"

HIGH 04 · Observability & Error Tracking (Sentry) 04 · Observabilidad y Seguimiento de Errores (Sentry)

PARTIAL

Sentry is mentioned in the stack but sampling rates, PII filtering, event taxonomy, and performance monitoring thresholds are not specified. Under-instrumented apps have silent failures in production. Sentry aparece en el stack pero sin tasas de muestreo, filtrado de PII, taxonomía de eventos ni umbrales de performance. Las apps sub-instrumentadas tienen fallos silenciosos en producción.

Sentry Configuration Spec (click to expand)
// renderer/main.tsx — Sentry init
import * as Sentry from '@sentry/electron/renderer';

Sentry.init({
  dsn: process.env.VITE_SENTRY_DSN,
  environment: process.env.NODE_ENV,
  release: app.getVersion(),

  // Sampling — aggressive in dev, conservative in prod
  tracesSampleRate: process.env.NODE_ENV === 'production' ? 0.1 : 1.0,
  profilesSampleRate: 0.05, // CPU profiling — 5% of transactions

  // PII Scrubbing — NEVER send user content to Sentry
  beforeSend(event) {
    // Strip message content (chat messages may contain business data)
    if (event.extra?.messages) delete event.extra.messages;
    if (event.extra?.prompt) delete event.extra.prompt;
    // Strip marketplace credentials from breadcrumbs
    event.breadcrumbs?.values?.forEach(crumb => {
      if (crumb.data?.token) crumb.data.token = '[Filtered]';
      if (crumb.data?.apiKey) crumb.data.apiKey = '[Filtered]';
    });
    return event;
  },

  // Integrations
  integrations: [
    Sentry.browserTracingIntegration(),
    Sentry.replayIntegration({
      maskAllText: true,       // block all text from session replay
      blockAllMedia: true,
    }),
  ],
});

Custom Event Taxonomy

Event Name Trigger Severity Alert?
tool_execution_failedTool returns error after 3 retriesWarningNo
irreversible_action_takenPrice change / inventory update confirmedInfoNo
credit_exhaustedBalance hits 0WarningYes (Slack)
marketplace_auth_expiredAPI returns 401/403ErrorYes (Slack)
claude_api_errorAnthropic API returns 5xxErrorYes (PagerDuty)
ipc_bridge_timeoutIPC call > 5s with no responseCriticalYes (PagerDuty)
rollback_failedRollback tool returns errorCriticalYes (PagerDuty)

Performance Thresholds (alert if exceeded)

  • App cold start: > 3s → warning
  • IPC round-trip: > 500ms → warning
  • Tool execution: > 10s → log
  • First token latency: > 2s → log
  • Chat render FPS: < 30fps → log

User Context (always attach)

Sentry.setUser({
  id: userId,     // NOT email
  plan: 'pro',
});
Sentry.setContext('marketplace', {
  active: 'amazon',
  region: 'US',
});
// NEVER set: email, apiKey, sellerId

MEDIUM 05 · In-App Support & Help Center 05 · Soporte In-App y Centro de Ayuda

MISSING

No help center, FAQ panel, or support chat specified. B2B desktop apps need accessible support without leaving the app. Pattern: ? button in status bar → slide-over panel with search + articles + live chat. Sin centro de ayuda, panel FAQ ni chat de soporte especificado. Las apps B2B necesitan soporte accesible sin salir de la app.

Support Entry Points

  • Status bar ? — always visible, 24px tall, right corner. Opens help slide-over. Free plan gets async email; Pro gets live chat widget (Crisp or Intercom).
  • Error recovery banners — Type A errors include "Need help?" link that pre-fills support form with error context.
  • Keyboard: Cmd+Shift+? — opens help slide-over from anywhere in the app.
  • First run onboarding — 3-step coach intro with "How does this work?" expandable FAQ inline.

Help Panel Anatomy

Help & Support
🔍 Search articles…

Popular Articles

How does the Buy Box coach work?
Connecting your Amazon Seller account
Understanding credits and billing

Integration Recommendation: Crisp.chat (not Intercom)

Crisp is $25/mo vs Intercom's $74/mo minimum. Crisp has a WebView embed that works in Electron without SDK conflicts. For v1: embed Crisp chatbox in the help slide-over WebContentsView. For v2: evaluate Intercom when MRR > $10K.

MEDIUM 06 · Demo & Trial Mode 06 · Modo Demo y Trial

MISSING

No sandbox or mock data strategy exists. New users who haven't connected a marketplace account see a blank app. Every B2B tool that converts well has a demo mode that shows the product's value immediately. No existe estrategia de sandbox o datos de prueba. Los usuarios nuevos sin cuenta marketplace conectada ven una app vacía. Todo B2B que convierte bien tiene un modo demo que muestra el valor del producto inmediatamente.

Demo Data Strategy

// demo/fixtures.ts
export const DEMO_SELLER = {
  marketplace: 'amazon',
  region: 'US',
  storeName: 'Acme Electronics',
  plan: 'pro',
};

export const DEMO_ASINS = [
  { asin: 'B08N5WRWNW', title: 'Wireless Earbuds Pro',
    price: 49.99, buyBox: 78, bsr: 1247, stock: 342 },
  { asin: 'B09G9FPHY6', title: 'USB-C Hub 7-in-1',
    price: 34.99, buyBox: 0, bsr: 891, stock: 12 },  // stock warning
  { asin: 'B0BDJ179PH', title: 'Phone Stand Aluminum',
    price: 19.99, buyBox: 34, bsr: 3401, stock: 98 },
];

export const DEMO_COMPETITORS = {
  'B08N5WRWNW': [
    { seller: 'TechDirect', price: 47.99, bbPercent: 22 },
    { seller: 'ElectroHub', price: 51.99, bbPercent: 0 },
  ],
};

// Demo coach responses — scripted for max "aha moment"
export const DEMO_CHAT_SCRIPT = [
  {
    trigger: 'first_message',
    response: 'I can see your Buy Box win rate dropped 23% this week on B08N5WRWNW. Your main competitor TechDirect lowered their price to $47.99 two days ago. Want me to analyze if repricing to $46.49 would recover the Buy Box while maintaining your margin?',
  },
];

Demo Banner — Persistent Indicator

🎭

Demo Mode

Simulated data — no real changes will be made

Demo Mode Rules

  • • All tool calls return fixture data, never real API
  • • Confirmation dialogs work but action is a no-op
  • • Credits don't decrement (infinite demo credits)
  • • Audit log shows demo actions with 🎭 prefix
  • • "Connect Account" CTA always visible in sidebar
  • • Demo mode auto-activates if no marketplace connected

MEDIUM 07 · Multi-Account Management 07 · Gestión Multi-Cuenta

MISSING

Power sellers operate 2-5 marketplace accounts (Amazon US + MX, MeLi MX + CO). No account switching UI is specified. This is a v1 blocker for agency users and will be requested in the first week of beta. Los sellers avanzados operan 2-5 cuentas de marketplace. No existe UI para cambio de cuenta. Es un bloqueador v1 para usuarios agencia.

Account Data Model

interface MarketplaceAccount {
  id: string;          // uuid
  marketplace: 'amazon' | 'meli' | 'shopify';
  region: string;      // 'US' | 'MX' | 'CO' | etc.
  displayName: string; // "Acme US Store"
  avatarInitials: string; // "AC"
  avatarColor: string; // auto-assigned from palette
  lastSynced: number;
  isDefault: boolean;
  credentialKey: string; // safeStorage key reference
}

// Max accounts per plan:
// Free:  2 accounts
// Pro:   10 accounts
// (encourages Pro upsell for agencies)

Account Switcher UI

Switch Account

AC

Acme US

Amazon US · ✓ active

AM

Acme MX

Amazon MX

+

Add account

Account Switch Behavior

  • Context isolation: chat history, ASIN lists, and audit logs are scoped per account — switching loads the other account's data
  • Keyboard shortcut: Cmd+Shift+A opens account switcher dropdown
  • Status bar: shows active account name truncated to 20 chars + marketplace icon
  • Switch is instant: no reload, React state swap — chat input clears, context bar updates, tab bar highlights appropriate marketplace

MEDIUM 08 · Desktop OS Integration — Missing Specs 08 · Integración con el SO Desktop — Specs Faltantes

PARTIAL

Several Electron desktop OS integration points are specified at a high level but lack implementation detail: single-instance lock, deep link protocol, right-click context menus, tray badge counts, and drag-and-drop. Varios puntos de integración con el SO están especificados a alto nivel pero sin detalle de implementación.

Single-Instance Lock (prevents duplicate windows)
// main/index.ts
const gotTheLock = app.requestSingleInstanceLock();

if (!gotTheLock) {
  app.quit(); // Second instance — quit immediately
} else {
  // First instance: handle second-instance attempt
  app.on('second-instance', (event, commandLine) => {
    if (mainWindow) {
      if (mainWindow.isMinimized()) mainWindow.restore();
      mainWindow.focus();
      // If launched with deep link (e.g., shopilot://auth/callback?code=...)
      const deepLink = commandLine.find(arg => arg.startsWith('shopilot://'));
      if (deepLink) handleDeepLink(deepLink);
    }
  });
}
Deep Link Protocol — shopilot://
// main/index.ts — Protocol registration
if (process.defaultApp) {
  if (process.argv.length >= 2) {
    app.setAsDefaultProtocolClient('shopilot', process.execPath, [path.resolve(process.argv[1])]);
  }
} else {
  app.setAsDefaultProtocolClient('shopilot');
}

// Supported deep link routes:
// shopilot://auth/callback?code=&state=    → OAuth2 callback (Amazon/MeLi)
// shopilot://asin/{asin}                   → Focus chat on specific ASIN
// shopilot://alert/{alertId}              → Open specific fraud/price alert
// shopilot://billing/upgrade              → Jump to billing settings

function handleDeepLink(url: string) {
  const parsed = new URL(url);
  switch (parsed.pathname) {
    case '/auth/callback':
      mainWindow.webContents.send('auth:callback', {
        code: parsed.searchParams.get('code'),
        state: parsed.searchParams.get('state'),
      });
      break;
    case `/asin/${parsed.pathname.split('/')[2]}`:
      mainWindow.webContents.send('navigate:asin', parsed.pathname.split('/')[2]);
      break;
  }
}
Right-Click Context Menus
// main/contextMenu.ts
import { Menu, MenuItem, ipcMain } from 'electron';

ipcMain.on('show-context-menu', (event, context) => {
  const menu = new Menu();

  if (context.type === 'asin') {
    menu.append(new MenuItem({
      label: `Analyze ${context.asin}`,
      click: () => event.sender.send('coach:analyze', context.asin),
    }));
    menu.append(new MenuItem({
      label: 'View on Amazon',
      click: () => shell.openExternal(`https://amazon.com/dp/${context.asin}`),
    }));
    menu.append(new MenuItem({ type: 'separator' }));
    menu.append(new MenuItem({
      label: 'Copy ASIN',
      click: () => clipboard.writeText(context.asin),
    }));
  }

  if (context.type === 'price') {
    menu.append(new MenuItem({ label: 'Copy price', click: () => clipboard.writeText(context.value) }));
    menu.append(new MenuItem({ label: 'Ask coach about this price', click: () => event.sender.send('coach:ask', `Why is this price ${context.value}?`) }));
  }

  menu.popup({ window: BrowserWindow.fromWebContents(event.sender)! });
});

Tray Menu + Badge Counts

// Update tray badge when alerts arrive
function updateTrayBadge(count: number) {
  if (process.platform === 'darwin') {
    app.dock.setBadge(count > 0 ? String(count) : '');
  }
  tray.setToolTip(`Shopilot — ${count > 0 ? `${count} alerts` : 'All clear'}`);
}

// Tray context menu
const trayMenu = Menu.buildFromTemplate([
  { label: 'Open Shopilot', click: () => mainWindow.show() },
  { label: 'Pause Coach', type: 'checkbox', checked: false,
    click: (item) => mainWindow.webContents.send('coach:pause', item.checked) },
  { type: 'separator' },
  { label: 'Check for Updates', click: () => autoUpdater.checkForUpdatesAndNotify() },
  { label: 'Quit', click: () => app.quit() },
]);

PARTIAL 09 · E2E Testing Framework 09 · Framework de Pruebas E2E

INCOMPLETE SPEC

Unit tests and component tests are implied but no E2E testing framework is explicitly specified. For an Electron app making real API calls and marketplace mutations, E2E tests are non-negotiable before beta. Las pruebas unitarias están implícitas pero no se especifica framework de E2E. Para una app Electron que hace mutaciones reales en marketplaces, las pruebas E2E son no-negociables antes del beta.

Testing Pyramid for Shopilot

E2E Tests (Playwright + electron-playwright) 10% of tests · Happy paths + critical mutations
Integration Tests (Vitest + MSW) 30% of tests · API mocking, IPC handlers
Unit Tests (Vitest) 60% of tests · Tools, reducers, utils, formatters
Critical E2E Test Cases (must pass before beta)
Test Case Why critical Mode
App launches, shows demo mode, chat accepts inputSmoke test — must always passDemo
Connect Amazon account via OAuth → tokens stored in safeStorageAuth is the first real actionSandbox
Send message → tool executes → confirmation appears → user approves → audit log writtenCore happy pathMock API
Approve irreversible action → confirm with typed text → action recorded → rollback availableTrust-critical flowMock API
Credits hit 0 → coach blocks → credit exhausted modal shows → upgrade flow opensRevenue-critical guardMock API
App restart → chat history loads from IndexedDB → last session visiblePersistence correctnessMock API
Update available → modal shows → user clicks Restart → app re-opens at same stateUpdate UX must not lose workMocked updater
Playwright + Electron Setup (click to expand)
// e2e/setup.ts
import { _electron as electron } from 'playwright';
import { test, expect } from '@playwright/test';

let electronApp: ElectronApplication;

test.beforeAll(async () => {
  electronApp = await electron.launch({
    args: ['dist/main/index.js'],
    env: {
      ...process.env,
      NODE_ENV: 'test',
      SHOPILOT_DEMO_MODE: 'true', // use fixture data
    },
  });
});

test.afterAll(async () => {
  await electronApp.close();
});

// Example test: coach chat flow
test('coach responds to ASIN query', async () => {
  const window = await electronApp.firstWindow();
  await window.fill('[data-testid="chat-input"]', 'What is happening with B08N5WRWNW?');
  await window.press('[data-testid="chat-input"]', 'Enter');
  await expect(window.locator('[data-testid="coach-response"]')).toBeVisible({ timeout: 10000 });
  await expect(window.locator('[data-testid="tool-accordion"]')).toBeVisible();
});

10 · Production Readiness Checklist 10 · Checklist de Listo para Producción

GATE CRITERIA

These gates must pass before each release milestone. No gate can be manually overridden without written sign-off from CEO + CTO. Estos gates deben pasar antes de cada milestone de release. Ningún gate puede omitirse sin aprobación escrita del CEO + CTO.

GATE 1 — Private Beta (before any external user)

Requirement Owner
All 7 E2E test cases pass on macOS 14 + macOS 15Sergio
Code signing + notarization working (Apple Developer cert)Mateo
Sentry DSN configured, PII filter verified, test event sentAndrés
Chat persistence: sessions survive app restartSergio
Single-instance lock prevents duplicate windowMateo
Demo mode works without any marketplace credentialsSergio
Update notification modal tested with mock version bumpMateo
Privacy policy published at shopilot.ai/privacyPablo

GATE 2 — Public Beta (before paid users)

Requirement Owner
GDPR data export (ZIP) working for all usersAndrés
Account deletion flow tested end-to-endAndrés
In-app support (Crisp) embedded and testedSergio
Multi-account: 2+ accounts with correct context isolationSergio
Deep link protocol (shopilot://) working for OAuth callbackMateo
Tray menu + badge count for unread alertsSergio
Right-click context menus on ASIN rows and pricesSergio
Terms of Service published + accepted on first launchPablo
Stripe webhooks tested for subscription lifecycleAndrés

GATE 3 — v1.0 GA

Requirement Owner
Figma Atomic Design library complete (atoms + molecules + organisms)External Design Team
Figma MCP integration working (Claude reads components directly)Mateo
WCAG AA audit passing (axe-playwright on all screens)Sergio
Performance: cold start < 3s on 2019 MBP (8GB RAM)Mateo
Windows 11 build passing (secondary target)Mateo
SOC 2 Type I audit initiated (required for enterprise)Pablo

The 80/20 Rule for Production Readiness La Regla 80/20 para Estar Listo para Producción

80% of production incidents come from 20% of neglected areas: auth edge cases, update failures, data loss on crash, and silent API errors. This section addresses all four. Ship Gate 1 within the first 3 weeks of dev, Gate 2 before any paid user, and Gate 3 before any press mention. El 80% de los incidentes en producción vienen del 20% de áreas descuidadas: edge cases de auth, fallos de actualización, pérdida de datos en crash, y errores silenciosos de API.

spec: v7 v6 v5 v4 v3 v2.1 v2 v1 Mar 10, 2026 — 04:33 PM CST