Tekton Installer Rollback Guide
Purpose
This guide defines the manual rollback path for failed Tekton upgrade operations triggered from Image Factory.
Scope
- Provider-scoped rollback (one infrastructure provider at a time).
- Covers both install modes:
gitopsimage_factory_installer
Preconditions
- Identify provider ID and intended rollback version.
- Confirm cluster access for the provider.
- Confirm no installer job is currently
pendingorrunningfor that provider.
1) Assess Failure
- Fetch installer status from Image Factory:
GET /api/v1/admin/infrastructure/providers/{id}/tekton/status
- Record:
- Failed job ID
- Error message
- Last successful asset version (from prior successful job, Git history, or release notes)
- Validate cluster reachability and readiness:
GET /api/v1/admin/infrastructure/providers/{id}/readiness
2) Rollback for gitops Mode
- Revert the GitOps source to the last known-good Tekton asset version.
- Trigger your GitOps sync (ArgoCD/Flux) and wait for reconciliation.
- Verify resources in target namespace:
- Pipelines:
image-factory-build-v1-* - Required tasks:
git-clone,docker-build,buildx,kaniko,packer
- Pipelines:
- Re-run readiness check endpoint.
- Run a small validation build in Image Factory.
3) Rollback for image_factory_installer Mode
- Checkout or prepare the previous known-good
backend/tektonassets. - Apply the known-good manifests to the provider cluster namespace:
- Use the same namespace configured by
tekton_target_namespace(or tenant namespace fallback).
- Use the same namespace configured by
- Re-run installer validation:
POST /api/v1/admin/infrastructure/providers/{id}/tekton/validate
- Re-run readiness check endpoint.
- Execute a smoke build to confirm successful scheduling and execution.
4) Post-Rollback Validation
- Confirm no active installer job remains.
- Confirm readiness reports no missing Tekton pipelines/tasks for the expected profile.
- Confirm at least one build completes successfully.
- Capture incident notes:
- Failed version
- Restored version
- Root cause and follow-up action
5) Operational Notes
- Keep provider upgrades serialized; avoid concurrent install/upgrade operations.
- Prefer rollback to last known-good version before attempting forward-fix in production.
- If rollback fails repeatedly, disable provider selection for affected tenants until resolved.