Ssecopslab.eu

Infrastructure Security & Reliability Audit

Findings & Remediation Plan

Client: Northwind Exchange (fictional) Engagement: 2 weeks · remote · read-only Scope: CI/CD, Kubernetes, node infra Prepared by: Alejandro Andreu · SecOps Lab Classification: Confidential (specimen) Version: 1.0 — example

01Executive summary

This audit assessed the security and reliability posture of Northwind's production infrastructure ahead of an enterprise customer's security review. The platform is fundamentally sound and the team is capable; the issues below are the predictable result of fast growth, not negligence. Most are low-effort, high-leverage fixes.

We identified 3 critical and 4 high-severity issues that should be resolved before the customer review, plus a set of medium/low items for the following quarter. The single most important finding is an unsigned, unscanned CI/CD release path that would let a compromised dependency reach production nodes unnoticed.

Critical

High

Medium

Low / info

For the budget conversation. If you need one line for leadership: "An external audit found three critical supply-chain and access gaps that could expose production; remediation is scoped at roughly two engineer-weeks and clears our path through the customer security review."

02Scope & method

In scope: CI/CD pipelines and release path, Kubernetes cluster configuration (hardening), container image supply chain, secrets & access paths, and blockchain-node deployment reliability. Out of scope: application source code review, on-chain/smart-contract logic, physical security.

Method: read-only, least-privilege access over the engagement window. Approach combined STRIDE threat modelling of the build/release path, CIS-benchmark review of the Kubernetes and host layer, a vulnerability/CVE sweep of images and dependencies, and a reliability review of single points of failure and recovery paths. No active or destructive testing was performed.

03Findings at a glance

Ranked by severity and blast radius. Each ID links to its detail and remediation in the full report.

ID	Finding	Severity	Effort
SEC-01	Unsigned, unscanned images promoted straight to prod	Critical	M
SEC-02	CI runner holds standing cloud admin credentials	Critical	S
REL-01	All RPC nodes share one availability zone	Critical	M
SEC-03	Kubernetes API server reachable from office VPN range	High	S
SEC-04	Containers run as root; no seccomp / read-only FS	High	M
SEC-05	Long-lived, unrotated secrets in CI variables	High	S
REL-02	No tested restore path for node snapshot backups	High	M
SEC-06	12 base-image CVEs (4 fixable by version bump)	Medium	S
SEC-07	No branch protection / required review on infra repo	Medium	S
REL-03	Alerting covers liveness but not chain-sync lag	Medium	M

Effort key: S = under a day · M = 1–3 days · L = more than 3 days. Five low/info items omitted from this excerpt.

04Selected findings in detail

SEC-01Critical

Unsigned, unscanned images promoted straight to production

What we found Images are built in CI and pushed to the registry, then deployed to production nodes with no signature verification and no vulnerability gate. A maintainer of any third-party dependency — or anyone who can write to the registry — can land code on production without detection.

Why it matters This is the classic software-supply-chain attack path. For a venue holding customer funds it is both an operational and a regulatory exposure, and it is exactly what an enterprise security questionnaire probes.

Recommendation Sign images at build time (e.g. Sigstore/cosign) and verify signatures at admission; add a blocking CVE scan in the pipeline with an agreed severity threshold; restrict registry write access to CI only.

SEC-02Critical

CI runner holds standing cloud-admin credentials

What we found The shared CI runner is configured with a long-lived credential carrying broad cloud-admin permissions, available to every pipeline including those triggered by external contributor branches.

Why it matters Any pipeline compromise escalates immediately to full infrastructure control. Blast radius is the entire environment, not one service.

Recommendation Replace the static credential with short-lived, federated identity (OIDC) scoped per-pipeline and least-privilege; gate privileged pipelines behind protected branches and manual approval.

REL-01Critical

All RPC nodes share a single availability zone

What we found The full set of customer-facing RPC nodes runs in one availability zone behind one load balancer. A zone outage takes the entire RPC surface offline; there is no tested failover.

Why it matters A single cloud-provider incident becomes a full customer-facing outage — the kind that triggers SLA penalties and erodes trust at exactly the wrong moment.

Recommendation Spread nodes across at least two zones, health-check at the LB, and rehearse a zone-loss failover. Document the recovery runbook and target RTO.

05Prioritised remediation roadmap

Sequenced by risk-reduction per unit of effort, so the team gets the biggest safety gains first.

Before reviewSEC-02, SEC-05 — kill standing credentials, rotate and externalise secrets. Under a day each, removes the worst escalation paths.

Week 1SEC-01 — add signing + a blocking CVE scan to the release path. Closes the headline supply-chain finding.

Week 1–2REL-01, SEC-03 — multi-AZ the RPC fleet; restrict API-server exposure. Removes the critical single points of failure.

QuarterSEC-04, REL-02, SEC-06/07, REL-03 — rootless/seccomp hardening, tested restores, CVE hygiene, sync-lag alerting.

What "good" looks like after remediation: a signed, scanned release path with no standing admin credentials; a multi-AZ node fleet with a rehearsed failover; and a clean, defensible answer to the customer's security questionnaire. All achievable within the quarter.

SecOps Lab · secopslab.eu · Specimen / illustrative — synthetic data // real engagement reports are confidential and never published