← Back to secopslab.eu
Ssecopslab.eu
Infrastructure Security & Reliability Audit
Findings & Remediation Plan
Client: Northwind Exchange (fictional)
Engagement: 2 weeks · remote · read-only
Scope: CI/CD, Kubernetes, node infra
Prepared by: Alejandro Andreu · SecOps Lab
Classification: Confidential (specimen)
Version: 1.0 — example
01Executive summary
This audit assessed the security and reliability posture of Northwind's production infrastructure
ahead of an enterprise customer's security review. The platform is fundamentally sound and the team
is capable; the issues below are the predictable result of fast growth, not negligence. Most are
low-effort, high-leverage fixes.
We identified 3 critical and 4 high-severity issues that should be
resolved before the customer review, plus a set of medium/low items for the following quarter. The
single most important finding is an unsigned, unscanned CI/CD release path that would
let a compromised dependency reach production nodes unnoticed.
For the budget conversation. If you need one line for
leadership: "An external audit found three critical supply-chain and access gaps that could expose
production; remediation is scoped at roughly two engineer-weeks and clears our path through the customer
security review."
02Scope & method
In scope: CI/CD pipelines and release path, Kubernetes cluster configuration (hardening),
container image supply chain, secrets & access paths, and blockchain-node deployment reliability.
Out of scope: application source code review, on-chain/smart-contract logic, physical security.
Method: read-only, least-privilege access over the engagement window. Approach combined
STRIDE threat modelling of the build/release path, CIS-benchmark review of the Kubernetes and host layer,
a vulnerability/CVE sweep of images and dependencies, and a reliability review of single points of failure
and recovery paths. No active or destructive testing was performed.
03Findings at a glance
Ranked by severity and blast radius. Each ID links to its detail and remediation in the full report.
| ID | Finding | Severity | Effort |
| SEC-01 | Unsigned, unscanned images promoted straight to prod | Critical | M |
| SEC-02 | CI runner holds standing cloud admin credentials | Critical | S |
| REL-01 | All RPC nodes share one availability zone | Critical | M |
| SEC-03 | Kubernetes API server reachable from office VPN range | High | S |
| SEC-04 | Containers run as root; no seccomp / read-only FS | High | M |
| SEC-05 | Long-lived, unrotated secrets in CI variables | High | S |
| REL-02 | No tested restore path for node snapshot backups | High | M |
| SEC-06 | 12 base-image CVEs (4 fixable by version bump) | Medium | S |
| SEC-07 | No branch protection / required review on infra repo | Medium | S |
| REL-03 | Alerting covers liveness but not chain-sync lag | Medium | M |
Effort key: S = under a day · M = 1–3 days · L = more than 3 days. Five low/info items omitted from this excerpt.
04Selected findings in detail
SEC-01Critical
Unsigned, unscanned images promoted straight to production
What we found
Images are built in CI and pushed to the registry, then deployed to production nodes with no signature
verification and no vulnerability gate. A maintainer of any third-party dependency — or anyone who can
write to the registry — can land code on production without detection.
Why it matters
This is the classic software-supply-chain attack path. For a venue holding customer funds it is both an
operational and a regulatory exposure, and it is exactly what an enterprise security questionnaire probes.
Recommendation
Sign images at build time (e.g. Sigstore/cosign) and verify signatures at admission; add a blocking CVE
scan in the pipeline with an agreed severity threshold; restrict registry write access to CI only.
SEC-02Critical
CI runner holds standing cloud-admin credentials
What we found
The shared CI runner is configured with a long-lived credential carrying broad cloud-admin permissions,
available to every pipeline including those triggered by external contributor branches.
Why it matters
Any pipeline compromise escalates immediately to full infrastructure control. Blast radius is the entire
environment, not one service.
Recommendation
Replace the static credential with short-lived, federated identity (OIDC) scoped per-pipeline and
least-privilege; gate privileged pipelines behind protected branches and manual approval.
REL-01Critical
All RPC nodes share a single availability zone
What we found
The full set of customer-facing RPC nodes runs in one availability zone behind one load balancer.
A zone outage takes the entire RPC surface offline; there is no tested failover.
Why it matters
A single cloud-provider incident becomes a full customer-facing outage — the kind that triggers SLA
penalties and erodes trust at exactly the wrong moment.
Recommendation
Spread nodes across at least two zones, health-check at the LB, and rehearse a zone-loss failover.
Document the recovery runbook and target RTO.
05Prioritised remediation roadmap
Sequenced by risk-reduction per unit of effort, so the team gets the biggest safety gains first.
Before reviewSEC-02, SEC-05 — kill standing credentials, rotate and externalise secrets. Under a day each, removes the worst escalation paths.
Week 1SEC-01 — add signing + a blocking CVE scan to the release path. Closes the headline supply-chain finding.
Week 1–2REL-01, SEC-03 — multi-AZ the RPC fleet; restrict API-server exposure. Removes the critical single points of failure.
QuarterSEC-04, REL-02, SEC-06/07, REL-03 — rootless/seccomp hardening, tested restores, CVE hygiene, sync-lag alerting.
What "good" looks like after remediation: a signed, scanned release path with no standing
admin credentials; a multi-AZ node fleet with a rehearsed failover; and a clean, defensible answer to the
customer's security questionnaire. All achievable within the quarter.