Workflow Engine Architecture
This document describes a proposed workflow engine model for approvals, gatekeeping, and policy-driven transitions such as onboarding, role changes, promotions, and vulnerability review.
It is design-oriented reference material, not a statement that every workflow described here is already implemented in the published repository.
Why This Fits Image Factory
Workflow orchestration is needed for:
- Onboarding approvals
- Role additions
- Image promotion with tag gates
- CVE validation and security policy gates
This design builds on existing patterns:
- Dispatcher + status-based queue
- Event bus for async coordination
- PostgreSQL as source of truth
Design Goals
- Scalable: stateless workers, SKIP LOCKED claiming
- Auditable: immutable event log for each workflow
- Idempotent: safe retries and step replays
- Pluggable: step handlers for approvals, validation, build, promotion
- Integrates cleanly with current build/dispatcher architecture
Core Data Model
Tables (Proposed)
CREATE TABLE workflow_definitions (
id UUID PRIMARY KEY,
name VARCHAR(100) NOT NULL,
version INT NOT NULL,
definition JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (name, version)
);
CREATE TABLE workflow_instances (
id UUID PRIMARY KEY,
definition_id UUID NOT NULL REFERENCES workflow_definitions(id),
tenant_id UUID,
subject_type VARCHAR(50) NOT NULL,
subject_id UUID NOT NULL,
status VARCHAR(20) NOT NULL, -- running, blocked, failed, completed
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE workflow_steps (
id UUID PRIMARY KEY,
instance_id UUID NOT NULL REFERENCES workflow_instances(id),
step_key VARCHAR(100) NOT NULL,
status VARCHAR(20) NOT NULL, -- pending, running, succeeded, failed, blocked
attempts INT NOT NULL DEFAULT 0,
last_error TEXT,
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ
);
CREATE TABLE workflow_events (
id UUID PRIMARY KEY,
instance_id UUID NOT NULL REFERENCES workflow_instances(id),
step_id UUID,
type VARCHAR(50) NOT NULL,
payload JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_workflow_instances_status ON workflow_instances(status);
CREATE INDEX idx_workflow_steps_status ON workflow_steps(status);
Execution Model
- Workflow Orchestrator polls for runnable steps using
FOR UPDATE SKIP LOCKED. - Each step is executed by a handler (approval, validation, build, promotion).
- State transitions are written atomically.
- Workflow emits events on each transition.
Step Claim (Example)
WITH next_step AS (
SELECT ws.id
FROM workflow_steps ws
JOIN workflow_instances wi ON wi.id = ws.instance_id
WHERE ws.status = 'pending'
AND wi.status = 'running'
ORDER BY wi.created_at ASC
FOR UPDATE SKIP LOCKED
LIMIT 1
)
UPDATE workflow_steps
SET status = 'running', started_at = now()
WHERE id IN (SELECT id FROM next_step)
RETURNING *;
Interfaces (Go)
type WorkflowOrchestrator interface {
Run(ctx context.Context) error
}
type WorkflowRepository interface {
ClaimNextStep(ctx context.Context) (*WorkflowStep, error)
UpdateStep(ctx context.Context, step *WorkflowStep) error
UpdateInstance(ctx context.Context, inst *WorkflowInstance) error
AppendEvent(ctx context.Context, evt *WorkflowEvent) error
}
type StepHandler interface {
Key() string
Execute(ctx context.Context, step *WorkflowStep) (StepResult, error)
}
type StepResult struct {
Status string // succeeded, failed, blocked
Data map[string]any
}
Integration with Current System
Build Flow
- Step:
queue_build- Calls existing build service to create build with
status = queued - Dispatcher continues execution as today
- Calls existing build service to create build with
- Step:
await_build_completion- Waits on build completion events (from event bus)
Approvals
- Step:
approval_gate- Creates
approval_requests - Blocks until approval event arrives
- Creates
CVE Validation
- Step:
cve_check- Calls vulnerability scan service
- Fails or blocks based on policy thresholds
Promotion
- Step:
promote_image- Updates tag in registry and
imagesmetadata
- Updates tag in registry and
Suggested First Milestone
- Add schema tables.
- Implement orchestrator with:
queue_buildawait_build_completion
- Add approval step wired to existing approval tables.
- Emit events on transitions for observability.
Why This Scales
- Stateless workers and DB claiming
- Idempotent steps with retries
- Event-driven blocking/unblocking
- Clear separation between orchestration and execution