Build Dispatcher Design
This document describes the dispatcher-driven build execution model used by Image Factory, including queue representation, claiming behavior, and reliability goals.
1) Goals
- Scalable: Support high concurrency, multi-tenant fairness, and node/pool capacity.
- Reliable: Idempotent dispatch, retry semantics, and resilient to worker crashes.
- Observable: Clear audit trail across
buildsandbuild_executionswith metrics and events. - Extensible: Works with local executors today and Tekton pipelines tomorrow.
2) Non-Goals
- Dedicated queue tables.
- Complex job graph orchestration (DAG builds) is out of scope for the current implementation.
3) Core Data Model (No Queue Tables)
Existing tables used
builds(authoritative status)build_configs(one build = one method config)build_executions(execution instance + logs + artifacts)
Queue representation
builds.status = 'queued'is the queue.
Key statuses
- Build:
pending -> queued -> running -> completed/failed/cancelled - Execution:
queued -> running -> succeeded/failed/cancelled
4) High-Level Flow
User submits build
-> Build + BuildConfig stored
-> Build status = queued
Dispatcher polls queued builds
-> Select next eligible build (tenant fairness, limits)
-> Create build_executions row
-> Execute (local or Tekton)
-> Stream logs/status
-> Update build status
5) Dispatcher Responsibilities
- Polling: Find queued builds and claim one atomically.
- Eligibility: Enforce concurrency limits and tool availability.
- Selection: Tenant fairness + priority + resource fit.
- Execution: Create execution record and trigger executor.
- Recovery: Requeue orphaned builds if a dispatcher crashes.
6) Robust Dispatch Mechanics (SQL-first)
6.1 Atomic claim (single build)
Use SELECT ... FOR UPDATE SKIP LOCKED to avoid double dispatch.
WITH next_build AS (
SELECT id
FROM builds
WHERE status = 'queued'
ORDER BY created_at ASC
FOR UPDATE SKIP LOCKED
LIMIT 1
)
UPDATE builds
SET status = 'running', started_at = NOW()
WHERE id IN (SELECT id FROM next_build)
RETURNING id;
6.2 Idempotent execution record
Create exactly one running execution per build.
INSERT INTO build_executions (id, build_id, config_id, status, created_at, updated_at)
VALUES ($1, $2, $3, 'running', NOW(), NOW())
ON CONFLICT (build_id, status)
DO NOTHING;
6.3 Orphan recovery
Detect running builds with no active execution heartbeat and requeue.
UPDATE builds
SET status = 'queued'
WHERE status = 'running'
AND id IN (
SELECT build_id
FROM build_executions
WHERE status = 'running'
AND updated_at < NOW() - INTERVAL '10 minutes'
);
7) Scheduler Policy (Robust)
7.1 Tenant fairness
- Round-robin by tenant.
- Cap max running per tenant.
7.2 Priority (optional, metadata-driven)
- Add optional
priorityintobuilds.metadataorbuild_configs.metadata.
7.3 Resource fit
- Match requested CPU/memory to available capacity.
- Use infra selector to choose local vs Tekton.
8) Interfaces (Go)
8.1 Dispatcher
// Dispatcher watches queued builds and dispatches them.
type Dispatcher interface {
Run(ctx context.Context) error
Stop() error
}
// DispatcherConfig controls polling and limits.
type DispatcherConfig struct {
PollInterval time.Duration
MaxConcurrent int
MaxPerTenant int
}
8.2 Scheduler
// Scheduler selects which build to run next.
type Scheduler interface {
Next(ctx context.Context) (*build.Build, error)
}
8.3 Execution Orchestrator
// ExecutionOrchestrator creates executions and triggers executors.
type ExecutionOrchestrator interface {
StartExecution(ctx context.Context, b *build.Build) (*build.BuildExecution, error)
CancelExecution(ctx context.Context, buildID uuid.UUID) error
}
8.4 Build Executor
// BuildExecutor runs build logic (local or Tekton).
type BuildExecutor interface {
Execute(ctx context.Context, build *build.Build) (*build.BuildResult, error)
Cancel(ctx context.Context, buildID uuid.UUID) error
}
8.5 Execution Repository
// BuildExecutionRepository persists execution state.
type BuildExecutionRepository interface {
SaveExecution(ctx context.Context, execution *build.BuildExecution) error
UpdateExecution(ctx context.Context, execution *build.BuildExecution) error
GetRunningExecutionForBuild(ctx context.Context, buildID uuid.UUID) (*build.BuildExecution, error)
}
9) Tekton Integration Plan
9.1 Executor implementation
TektonExecutorimplementsBuildExecutor- Creates
PipelineRun - Streams logs from Tekton
- Updates
build_executionsstatus
9.2 Mapping method -> pipeline
- Method maps to pipeline template name
- Template rendering uses
build_configs
10) Observability
- Metrics: queue depth, dispatch latency, running builds, per-tenant concurrency
- Logs: dispatcher claims, execution lifecycle, errors
- Events:
build.execution.started,build.execution.completed,build.execution.failed
11) Security & Multi-Tenancy
- Dispatcher never cross-tenant data except scheduling.
- Execution uses tenant-scoped namespaces or isolated infra.
- Build config secrets must be resolved at execution time.
12) Migration Strategy
- Dispatcher reads
builds.status = queued, uses available executors. - Execution records + log streaming are wired for visibility.
- Tekton executor support is enabled when Kubernetes infrastructure is selected.
13) Risks + Mitigations
- Double dispatch: use
FOR UPDATE SKIP LOCKED+ execution uniqueness. - Stuck builds: orphan detection + requeue.
- Queue starvation: enforce per-tenant round-robin.
14) Open Questions
- Where to store priority (new column vs metadata)?
- Should we allow manual retry from UI that spawns a new execution?
- When should build status flip to
queued(immediately on create vs explicit start)?