diff --git a/skills/README.md b/skills/README.md index 249fa2128..585397d6c 100644 --- a/skills/README.md +++ b/skills/README.md @@ -15,7 +15,6 @@ compatible agents. | [`network-rca`](network-rca/) | Network Root Cause Analysis. Retrospective traffic analysis via snapshots, with two investigation routes: PCAP (for Wireshark/compliance) and Dissection (for AI-driven API-level investigation). | | [`kfl`](kfl/) | KFL2 (Kubeshark Filter Language) expert. Complete reference for writing, debugging, and optimizing CEL-based traffic filters across all supported protocols. | | [`security-audit`](security-audit/) | Network Security Audit. Systematic 8-phase threat detection across MITRE ATT&CK tactics — C2, exfiltration, lateral movement, credential theft, cryptomining, protocol abuse — using snapshot-based traffic analysis. | -| [`install`](install/) | Installation & Deployment. Guides CLI and Helm installation, builds custom values files, handles platform-specific config (EKS/GKE/AKS/OpenShift/KinD), auth, ingress, cloud storage, and troubleshooting. | ## Prerequisites diff --git a/skills/security-audit/SKILL.md b/skills/security-audit/SKILL.md new file mode 100644 index 000000000..de82721a9 --- /dev/null +++ b/skills/security-audit/SKILL.md @@ -0,0 +1,724 @@ +--- +name: security-audit +description: > + Kubernetes network security audit skill powered by Kubeshark MCP. Use this skill + whenever the user wants to audit a cluster for security threats, detect compromised + workloads, find malicious traffic patterns, hunt for indicators of compromise (IOCs), + check for data exfiltration, identify C2 (command and control) communication, + detect cryptomining, find lateral movement, discover credential theft attempts, + assess network security posture, or perform threat hunting in Kubernetes. + Also trigger when the user mentions security audit, threat detection, compromise + assessment, vulnerability scan, "is my cluster compromised", "find malicious traffic", + "check for threats", DNS exfiltration, DNS tunneling, port scanning, IMDS access, + reverse shell, crypto miner, MITRE ATT&CK, IOC detection, anomaly detection, + suspicious traffic, rogue workloads, unauthorized access, or any request to + evaluate cluster security through network traffic analysis. +--- + +# Kubernetes Network Security Audit with Kubeshark MCP + +You are a Kubernetes network security specialist. Your job is to systematically +audit cluster traffic for indicators of compromise, malicious behavior, and +security threats — using network traffic as the ground truth. + +Network traffic cannot lie. Logs can be tampered with, metrics can be spoofed, +but packets on the wire reveal what workloads actually do — what they connect to, +what protocols they speak, what data they send. Your audit leverages this by +examining DNS queries, HTTP requests, L4 flows, and protocol-level payloads +across every dimension of the MITRE ATT&CK framework. + +## Prerequisites + +Before starting any audit, verify the environment is ready. + +**Tool**: `check_kubeshark_status` + +Confirm Kubeshark is deployed and tools are available. You need at minimum: +`list_api_calls`, `list_l4_flows`, `list_workloads`, `get_api_call`. + +**KFL requirement**: This skill uses KFL filters for all queries. Before +constructing any filter, load the KFL skill (`skills/kfl/`). KFL is statically +typed — incorrect field names will fail silently. If the KFL skill is not +loaded, only use the exact filter examples shown in this skill. + +**KFL error resilience**: If a KFL filter returns `undeclared reference` or +similar errors, **do not give up on that phase**. Fall back to: +1. Port-based filtering: `dst.port == 5432` instead of protocol flags +2. Name-based filtering: `dst.name.contains("db")` or `src.name.contains("pod-name")` +3. Browsing entries with `get_api_call` on IDs from `list_l4_flows` +A KFL error means the filter syntax is wrong, not that the data doesn't exist. + +## Audit Methodology + +A security audit is NOT an incident investigation. You are not responding to +a known event — you are proactively searching for threats that may be hiding +in normal traffic. This requires a systematic sweep across all threat categories, +not a single focused query. + +The audit has **two sections** that run in sequence: + +``` +SECTION A: Real-Time Analysis → Instant, uses live dissected traffic +SECTION B: Snapshot Deep Dive → Immutable evidence, protocol-level inspection +``` + +### Why Two Sections? + +Kubeshark has two modes of data access: + +1. **Real-time dissection** — traffic is dissected as it flows through the + cluster. Provides instant access to L7 data (DNS, HTTP, etc.) that is + already captured and indexed. However, real-time dissection is resource- + intensive and may not be enabled, or may have gaps in coverage. + +2. **Snapshots** — immutable captures of raw traffic within a time window. + Must be created explicitly, then dissected separately. Guarantees complete + coverage of all packets in the window, but takes time to create and index. + +Section A uses whatever is already available — fast, immediate, but possibly +incomplete. Section B creates snapshots for thorough, evidence-grade analysis. + +### Severity Classification + +Classify every finding using this framework: + +| Severity | Criteria | Examples | +|----------|----------|---------| +| **CRITICAL** | Active data exfiltration, credential theft in progress, confirmed C2 | DNS tunneling, IMDS credential harvest, mining pool connections | +| **HIGH** | Reconnaissance with cluster-wide scope, confirmed unauthorized access | K8s API secret enumeration, port scanning, cluster-admin abuse | +| **MEDIUM** | Suspicious patterns requiring investigation, limited-scope recon | Cross-namespace probes, outdated User-Agents, unusual external connections | +| **LOW** | Anomalies that may be benign, single-instance events | Unknown workloads, new external destinations, noisy but not malicious | + +### Timezone + +Kubeshark returns timestamps in UTC. Always convert to local time before +presenting to the user. Detect the local timezone at the start (e.g., +`date +%Z`). Present local time as primary, with UTC in parentheses: +`15:03:22 IST (12:03:22 UTC)`. + +**Conversion**: Kubeshark timestamps are Unix milliseconds. To convert: +`ms / 1000` → Unix seconds → datetime → format with timezone offset. +Example: `1778534735974` → `2026-05-11 14:05:35 PDT (21:05:35 UTC)`. + +--- + +## SECTION A: Real-Time Analysis + +**Goal**: Fast initial sweep using live data that's already available. No +waiting for snapshot creation or dissection. + +### Step 1: Check What's Available + +**Tool**: `check_kubeshark_status` + +Confirm Kubeshark is running and which tools are available. + +**Tool**: `get_data_boundaries` + +Check how far back raw capture data exists. You need this to plan snapshot +creation in Step 3 — call it now so the data is ready when you need it. + +**Tool**: `list_workloads` (no snapshot_id — queries live state) + +Get the current workload inventory for the target namespace. This returns +pod names, namespaces, and IP addresses. Save the IPs — you'll need them +throughout the audit. + +**Note**: `list_workloads` without a `snapshot_id` may fail with some +Kubeshark versions (`snapshot_id is required for filtered listing`). If +this happens, use individual lookups with `name` + `namespace` parameters, +or skip to Step 3 and get the workload inventory from the first snapshot. + +### Step 2: Query Live Traffic + +In parallel, query the real-time dissected traffic across key dimensions. +Use `list_api_calls` and `list_l4_flows` **without** a `snapshot_id` to +hit the live data. + +Run these queries simultaneously: + +| Query | KFL Filter | What You're Looking For | +|-------|-----------|------------------------| +| DNS traffic | `dns` | Mining domains, high-entropy subdomains, external resolution, NXDOMAIN flood | +| HTTP traffic | `http` | C2 beaconing, suspicious URLs, external destinations, anomalous headers | +| L4 flows | (via `list_l4_flows`) | External IPs, suspicious ports (3333, 4444), IMDS (169.254.169.254), fan-out patterns | +| PostgreSQL | `postgresql` | SQL injection patterns, sensitive table access | +| Redis | `redis` | Dangerous commands (CONFIG, KEYS, CLIENT LIST) | + +Filter by namespace if the user specified one (e.g., `dns && src.pod.namespace == "k8s-mule"`). + +**Important**: Real-time dissection may have incomplete data — traffic that +arrived before dissection was enabled, or during gaps in coverage, won't +appear. Treat Section A findings as a fast first pass, not the final word. + +### Step 3: Create Snapshots (Sequential — One at a Time) + +While analyzing real-time data, begin creating snapshots for Section B. + +**CRITICAL: Create snapshots ONE AT A TIME, sequentially.** Kubeshark only +supports one concurrent snapshot download. Parallel creation will cause +failures and data loss. The pattern is: + +1. Create snapshot → wait for completion → start dissection → move to next +2. Snapshot creation is fast (seconds). Dissection is slow (minutes). +3. You do NOT need to wait for dissection before creating the next snapshot. + Create the next snapshot while the previous one dissects. + +Use the data boundaries from Step 1 (`get_data_boundaries`) to calculate +how many snapshots are needed: + +``` +total_range_ms = newest_timestamp - oldest_timestamp +window_ms = 240000 # 4 minutes +num_snapshots = ceil(total_range_ms / window_ms) +``` + +Then create snapshots in **4-minute increments**, starting from the most +recent: + +``` +Step 1: create_snapshot (now - 4min → now) + → poll get_snapshot until status == "completed" + → start_snapshot_dissection +Step 2: create_snapshot (now - 8min → now - 4min) + → poll get_snapshot until status == "completed" + → start_snapshot_dissection +Step 3: create_snapshot (now - 12min → now - 8min) + → poll get_snapshot until status == "completed" + → start_snapshot_dissection +``` + +**Polling pattern**: After `create_snapshot`, call `get_snapshot` with the +returned snapshot ID to check status. Repeat until `status == "completed"`. +After `start_snapshot_dissection`, call `get_snapshot_dissection_status` +and check until `progress == 100`. + +4-minute windows balance snapshot size (fast to create and dissect) against +coverage (captures threats with sleep cycles up to ~3 minutes). Most attack +patterns in the wild repeat within 30-120 seconds. + +**Do not skip this step.** A single short snapshot will miss threats with +longer sleep cycles. The 4-minute windows ensure full coverage. + +**Note**: Small snapshots (under ~15 minutes of traffic) often dissect in +seconds rather than minutes. If dissection completes quickly, you can +collapse the phased approach (immediate data first, L7 after) into a +single pass through all phases. + +### Step 4: Present Intermediate Results + +Present Section A findings to the user as **intermediate results** — clearly +labeled as preliminary: + +``` +## Intermediate Results (Real-Time Analysis) + +⚠️ These findings are based on live dissected traffic, which may have +gaps in coverage. Snapshot analysis is in progress and will provide +the complete, evidence-grade audit. + +[findings table and details] + +Snapshots are being created and dissected. Full report to follow. +``` + +This gives the user immediate value while snapshots process. But be explicit: +**the audit is not complete until Section B finishes.** + +--- + +## SECTION B: Snapshot Deep Dive + +**Goal**: Systematic, thorough analysis against immutable snapshot data. +This is the evidence-grade section — complete coverage, reproducible results. + +**The audit is NOT done until this section completes.** Snapshots must be +created, dissected, and analyzed at L7 before the final report is generated. +Section A may miss traffic that wasn't being dissected in real-time — Section B +captures everything in the raw PCAP buffer, including traffic that real-time +dissection dropped or never saw. Do not skip this section or treat Section A +results as the final word. + +### What a Snapshot Gives You + +A completed snapshot provides **three independent data sources** — do not +wait for dissection to use the first two: + +| Source | Available | Tool | What It Provides | +|--------|-----------|------|-----------------| +| **Workloads & IPs** | Immediately | `list_workloads` with `snapshot_id` | Pod names, namespaces, IPs at capture time | +| **L4 Flows** | Immediately | `list_l4_flows` with `snapshot_id` | TCP/UDP connections: src/dst IPs, ports, bytes, duration | +| **PCAP Export** | Immediately | `export_snapshot_pcap` | Raw packets filtered by BPF expression | +| **L7 Dissection** | After indexing | `list_api_calls`, `get_api_call`, `get_api_stats` | DNS queries, HTTP requests, SQL statements, Redis commands, gRPC methods | + +### Audit Flow Per Snapshot + +For each 4-minute snapshot, run the full 7-phase sweep. Start with immediate +data while dissection completes: + +``` +Snapshot ready + ├── Start dissection (background) + ├── Phase 1: list_workloads (immediate) — workload inventory + IPs + │ export_snapshot_pcap (immediate) — raw packet evidence + ├── Phase 3: list_l4_flows (immediate) — external flows, port scanning + ├── Phase 4: list_l4_flows (immediate) — lateral movement, fan-out + │ + ├── [dissection completes] + │ + ├── Phase 2: list_api_calls — DNS threat analysis + ├── Phase 5: list_api_calls — protocol abuse (PG, Redis, gRPC) + ├── Phase 6: list_api_calls — credential access (IMDS, cloud APIs) + └── Phase 7: correlate all findings +``` + +Process snapshots in reverse chronological order (most recent first). If the +first snapshot reveals enough threats, you may not need to analyze all of them. + +### PCAP for Deep Inspection + +PCAP export happens in Phase 1b (immediately after snapshot creation). In +later phases, if a new finding needs deeper packet-level analysis beyond +what `list_api_calls` provides, export additional PCAPs using the workload +IPs collected in Phase 1a: + +``` +export_snapshot_pcap(snapshot_id, bpf_filter="host ") +``` + +### Merging Findings Across Snapshots + +Threats that appear in multiple snapshots are confirmed persistent. One-time +events in a single snapshot may be transient. Note which findings repeat +across snapshots — persistence is a strong signal of real compromise vs. +a single anomalous event. + +--- + +## Phase 1: Workload Inventory & PCAP Evidence + +**Goal**: Identify all active workloads, collect their IPs, and export raw +PCAP evidence — all before dissection completes. +**Data source**: Immediate (no dissection needed). + +### 1a: Workload Inventory + +**Tool**: `list_workloads` with `snapshot_id` + +Query with the target namespace (or all namespaces). The response includes +pod names, namespaces, and **IP addresses at capture time** — these IPs are +critical for building BPF filters in later phases and for correlating L4 +flows to workload identities. + +For each workload, note: +- Pod name and namespace +- IP address (save these — you'll need them for PCAP export and L4 analysis) +- Whether it's expected (matches known deployments) + +**What to flag**: +- Workloads not matching any known Deployment/DaemonSet/StatefulSet +- Pods with names that mimic system components (e.g., `kube-proxy-debug`) +- Unexpected number of replicas or pods in the namespace + +### 1b: PCAP Export (Immediate — No Dissection Needed) + +**Tool**: `export_snapshot_pcap` with `snapshot_id` + +PCAP export is available immediately after snapshot creation — it reads raw +packets, not dissected data. Use it now to preserve evidence and get raw +packet-level visibility before L7 dissection completes. + +**Export PCAP for every CRITICAL finding** from Section A's real-time analysis. +Use the workload IPs from 1a to build BPF filters: + +``` +export_snapshot_pcap(snapshot_id, bpf_filter="host ") +``` + +This is especially useful for: +- Verifying encrypted C2 (TLS ClientHello SNI inspection) +- Confirming Stratum mining protocol content +- Extracting DNS tunnel payloads at packet level +- Preserving forensic evidence before cluster changes + +If Section A identified no CRITICAL findings yet, export a broad PCAP for +the most suspicious workloads based on L4 flow analysis (Phase 3). + +--- + +## Phase 2: DNS Threat Analysis + +**Goal**: DNS is the single most reliable indicator of compromise. Every attack +that communicates externally needs DNS resolution. Sweep DNS traffic for all +known threat patterns. + +### 2a: External DNS (Non-Cluster Queries) + +**Tool**: `list_api_calls` with KFL: `dns` + +Examine all DNS queries. Flag anything that is NOT `*.cluster.local` or +`*.svc.cluster.local` — these are external resolutions that reveal what +workloads are reaching out to. + +**What to flag**: + +| Pattern | Threat | KFL Filter | +|---------|--------|------------| +| Mining pool domains (minexmr, nanopool, mining-pool) | Cryptojacking | `dns && dns_questions.exists(q, q.contains("minexmr"))` | +| High-entropy subdomains (base64-like, >30 chars) | DNS tunneling / exfiltration | `dns` — then inspect subdomain length and entropy | +| DGA patterns (random .com/.net with NXDOMAIN) | C2 beaconing | `dns && dns_response && size(dns_answers) == 0` | +| DoH resolver domains (cloudflare-dns.com, dns.google) | DNS bypass / C2 channel | `dns && dns_questions.exists(q, q.contains("cloudflare-dns"))` | +| Cloud API domains (sts.amazonaws.com, s3.amazonaws.com) | Stolen credential usage | `dns && dns_questions.exists(q, q.contains("amazonaws.com"))` | +| C2/attacker domains (attacker, c2, darknet, exfil) | Command & Control | `dns && dns_questions.exists(q, q.contains("c2"))` | + +### 2b: DNS Query Volume and Types + +High query volume from a single pod is suspicious. Also check for unusual +record types: + +- **TXT queries** to external domains → data exfiltration +- **NULL queries** → DNS tunneling (iodine, dnscat2) +- **AXFR queries** → zone transfer attempts (reconnaissance) +- **SRV queries** to many namespaces → service enumeration + +### 2c: NXDOMAIN Ratio + +A high NXDOMAIN ratio (>20% of queries) from a single source suggests DGA +beaconing — the malware tries many generated domains, most of which don't exist. + +**Tool**: `list_api_calls` with KFL: `dns && dns_response && size(dns_answers) == 0` + +Compare the count of failed queries to total queries per source pod. + +--- + +## Phase 3: External Communication + +**Goal**: Identify all traffic leaving the cluster. Any pod connecting to +external IPs or domains needs justification. +**Data source**: Immediate (no dissection needed). Use L4 flows first, +then enrich with L7 data from dissection when available. + +### 3a: L4 External Flows + +**Tool**: `list_l4_flows` with `snapshot_id` + +This is available immediately — do not wait for dissection. Use the workload +IPs from Phase 1 to map flows to pod identities. + +Look for flows where the destination is NOT a cluster-internal IP (not RFC 1918: +10.x.x.x, 172.16-31.x.x, 192.168.x.x). Every external flow is a potential +exfiltration or C2 channel. + +**What to flag**: + +| Pattern | Threat | Severity | +|---------|--------|----------| +| Destination 169.254.169.254 | IMDS metadata credential theft | CRITICAL | +| Destination port 3333, 14433, 45700 | Stratum mining protocol | CRITICAL | +| Destination port 4444, 1337 | Reverse shell / backdoor | CRITICAL | +| Persistent connections to single external IP | C2 beaconing | HIGH | +| Large outbound data volume (>1MB) to external | Data exfiltration | HIGH | +| Connections to cloud API endpoints (port 443) | Stolen credential usage | MEDIUM | + +### 3b: HTTP External Requests + +**Tool**: `list_api_calls` with KFL: `http && !dst.pod.namespace.startsWith("kube")` + +Inspect outbound HTTP requests for: + +- **Beaconing patterns**: Regular-interval requests to the same external URL +- **Suspicious User-Agents**: `Mozilla/4.0`, `curl/`, empty, or malware-like +- **Suspicious paths**: `/check?s=`, `/beacon`, `/heartbeat`, `/proxy?coin=` +- **Base64 in headers**: Oversized Cookie or custom X-* headers with encoded data +- **gRPC to external**: `Content-Type: application/grpc` to non-cluster destinations +- **WebSocket upgrades**: `Upgrade: websocket` to external hosts (potential mining) + +--- + +## Phase 4: Lateral Movement + +**Goal**: Identify pods communicating with services they shouldn't — crossing +namespace boundaries, probing infrastructure, or scanning the network. +**Data source**: L4 flows (immediate) for port scanning detection. L7 +dissection (after indexing) for cross-namespace HTTP and API server analysis. + +### 4a: Cross-Namespace Traffic + +**Tool**: `list_api_calls` with KFL: `src.pod.namespace != dst.pod.namespace` + +Most pods should only talk within their namespace (and to kube-system services). +Cross-namespace traffic to unexpected destinations is a lateral movement indicator. + +### 4b: Kubernetes API Server Access + +**Tool**: `list_api_calls` with KFL: `http && dst.port == 443 && path.startsWith("/api")` + +Check what pods are querying the K8s API server and what they're requesting: + +| API Path | Threat | Severity | +|----------|--------|----------| +| `/api/v1/secrets` | Secret enumeration | CRITICAL | +| `/api/v1/pods` | Workload discovery | HIGH | +| `/apis/rbac.authorization.k8s.io` | RBAC reconnaissance | HIGH | +| `/api/v1/configmaps` | Config enumeration | MEDIUM | +| `/api/v1/namespaces` | Namespace discovery | MEDIUM | + +A pod hitting **multiple** of these paths is performing systematic enumeration, +not legitimate API access. Legitimate workloads typically access 1-2 specific +resources, not sweep across resource types. + +### 4c: Port Scanning Detection + +**Tool**: `list_l4_flows` with `snapshot_id` (immediate — no dissection needed) + +Use the workload IPs from Phase 1 to identify the source pod. +Look for a single source IP with connections to: +- Many distinct destination IPs (>10) +- Many distinct destination ports (>5) +- High connection failure rate (RST/timeout) + +This is a textbook port scan pattern. + +### 4d: Service Fingerprinting + +**Tool**: `list_api_calls` with KFL: `http && (path == "/.env" || path == "/actuator/info" || path == "/server-info" || path == "/version")` + +These paths are used for service fingerprinting — mapping what software is +running on internal endpoints. A pod probing multiple services with these +paths is performing reconnaissance. + +### 4e: Service Account Permission Audit via Traffic + +Cross-reference Phase 4b findings (K8s API traffic) with the source pod's +actual service account to determine if permissions are excessive. + +For each pod making API server calls: + +1. **Identify the service account**: From the workload inventory or via + `kubectl get pod -n -o jsonpath='{.spec.serviceAccountName}'` +2. **Check what it accessed**: The API paths from Phase 4b reveal what the + pod actually queried (secrets, pods, RBAC, configmaps) +3. **Compare against expected access**: A `frontend` pod should never hit + `/api/v1/secrets`. A `batch-processor` has no reason to query + `/apis/rbac.authorization.k8s.io/v1/clusterrolebindings`. + +**What to flag**: + +| Pattern | Threat | Severity | +|---------|--------|----------| +| Pod queries secrets but its SA only needs pod read | Over-privileged SA or stolen token | HIGH | +| Pod hits cluster-wide endpoints (`--all-namespaces` style queries) | Cluster-admin binding | CRITICAL | +| Pod's SA is `default` but makes authenticated API calls | Token mounted unnecessarily | MEDIUM | +| Multiple pods share the same over-privileged SA | Lateral blast radius | HIGH | + +This converts a network finding (API traffic volume) into an actionable RBAC +recommendation — telling the user exactly which ClusterRoleBinding to revoke. + +### 4f: Cross-Namespace Threat Correlation + +When port scanning or lateral movement targets IPs outside the audited +namespace (e.g., IPs in the pod CIDR `10.244.x.x` that don't belong to +any workload in the target namespace), resolve them to identify the +cross-namespace blast radius: + +1. Use `list_workloads` (all namespaces) to map destination IPs to pods +2. Identify which namespaces are being probed +3. Flag the scope: "port scan from `k8s-mule/network-diagnostics` is + targeting pods in `default`, `monitoring`, and `kube-system`" + +This turns a single-namespace finding into a cluster-wide risk assessment. + +--- + +## Phase 5: Protocol Abuse + +**Goal**: Inspect L7 payload content for attack patterns within supported +protocols. This is the phase most often skipped — and where subtle threats hide. + +### 5a: PostgreSQL Wire Protocol + +**Tool**: `list_api_calls` with KFL: `postgresql` + +The `postgresql_query` variable contains the full SQL text. Use it to detect: + +| KFL Filter | Threat | Severity | +|------------|--------|----------| +| `postgresql && postgresql_query.contains("UNION SELECT")` | SQL injection | HIGH | +| `postgresql && postgresql_query.contains("pg_shadow")` | Password hash theft | CRITICAL | +| `postgresql && postgresql_query.contains("information_schema")` | Schema enumeration | MEDIUM | +| `postgresql && postgresql_query.contains("TRUNCATE")` | Data destruction | CRITICAL | +| `postgresql && postgresql_query.contains("DROP TABLE")` | Data destruction | CRITICAL | +| `postgresql && !postgresql_success` | Failed queries (may indicate probing) | MEDIUM | + +Use `get_api_call` to inspect the full SQL content. Also check `postgresql_user` +— queries from unexpected users are suspicious. + +### 5b: Redis Protocol + +**Tool**: `list_api_calls` with KFL: `redis` + +Use `redis_type` (command verb) and `redis_command` (full command line) to detect: + +| KFL Filter | Threat | Severity | +|------------|--------|----------| +| `redis && redis_type == "CONFIG"` | Server config dump/write | HIGH | +| `redis && redis_type == "KEYS"` | Full key enumeration | HIGH | +| `redis && redis_type == "CLIENT"` | Connection enumeration | MEDIUM | +| `redis && redis_type == "DEBUG"` | Debug access | MEDIUM | +| `redis && redis_command.contains("CONFIG SET dir")` | Arbitrary file write (RCE) | CRITICAL | +| `redis && redis_type == "FLUSHALL"` | Data destruction | CRITICAL | + +### 5c: gRPC Endpoints + +**Tool**: `list_api_calls` with KFL: `grpc` + +Use `grpc_method` to inspect method names: + +| KFL Filter | Threat | Severity | +|------------|--------|----------| +| `grpc && grpc_method.contains("Reflection")` | API surface enumeration | MEDIUM | +| `grpc && dst.name.contains("attacker")` | Data exfiltration | HIGH | +| `grpc && grpc_status != 0` | Failed gRPC calls (may indicate probing) | LOW | + +### 5d: HTTP Request Anomalies + +**Tool**: `list_api_calls` with KFL: `http` + +Check for: +- **WebSocket upgrades to external hosts**: `Upgrade: websocket` header — potential + mining proxy or persistent C2 channel +- **DNS-over-HTTPS requests**: `accept: application/dns-json` header — DNS bypass +- **AWS Signature headers**: `Authorization: AWS4-HMAC-SHA256` — stolen cloud creds +- **IMDS-specific headers**: `X-aws-ec2-metadata-token-ttl-seconds` — token request + +--- + +## Phase 6: Credential Access + +**Goal**: Detect active credential theft — IMDS access, service account abuse, +cloud API exploitation. + +### 6a: Instance Metadata Service (IMDS) + +**Tool**: `list_api_calls` with KFL: `dst.ip == "169.254.169.254"` + +Or use `list_l4_flows` to find connections to 169.254.169.254. + +Any pod connecting to this IP is attempting to steal the node's cloud credentials. +Check the HTTP paths: + +| Path | What's Being Stolen | +|------|-------------------| +| `/latest/meta-data/iam/security-credentials/` | IAM role name | +| `/latest/meta-data/iam/security-credentials/` | Actual AWS credentials | +| `/latest/dynamic/instance-identity/document` | Instance identity (account ID, region) | +| `/latest/user-data` | Instance bootstrap scripts (may contain secrets) | +| `/latest/api/token` (PUT) | IMDSv2 session token | + +### 6b: Service Account Token Exfiltration + +Look for HTTP requests where the body or headers contain JWT tokens +(strings starting with `eyJ`). These may be service account tokens being +sent to external endpoints. + +--- + +## Phase 7: Attack Chain Correlation + +**Goal**: Connect individual findings into a coherent attack narrative. + +After completing phases 1-6, synthesize findings into an attack chain. Real +attacks follow a progression: + +``` +1. INITIAL ACCESS → How did the attacker get in? +2. RECONNAISSANCE → Port scanning, DNS enumeration, API discovery +3. CREDENTIAL ACCESS → IMDS theft, secret enumeration, token exfil +4. LATERAL MOVEMENT → Cross-namespace probing, SSRF, service scanning +5. EXFILTRATION → DNS tunneling, HTTP exfil, gRPC streaming +6. PERSISTENCE → C2 beaconing, cryptomining (monetization) +``` + +Map each finding to a stage. If you see findings across multiple stages from +the same namespace or related workloads, you've found a coordinated attack. + +### Output Format + +Present the audit results as: + +1. **Workload inventory** — table of all observed workloads with threat level +2. **Detailed findings** — one section per finding, ordered by severity +3. **Attack chain summary** — if findings correlate, map the kill chain +4. **Immediate actions** — prioritized remediation steps + +--- + +## Audit Report — Two-Stage Delivery + +The audit produces **two outputs** — an intermediate report during Section A, +and a final PDF report after Section B completes. + +### Stage 1: Intermediate Report (after Section A) + +Present findings from real-time analysis directly in the conversation. Clearly +label as preliminary. This gives the user immediate value while snapshots +are being created and dissected. + +### Stage 2: Final PDF Report (after Section B) + +This is the primary deliverable. It is generated **only after all snapshots +have been dissected and analyzed at L7**. Do not generate the final report +based on Section A alone — that would miss protocol-level threats (SQL +injection, Redis abuse, gRPC exfil) that only appear after dissection. + +1. **Write** the report as markdown: `security-audit--.md` + Follow the template in `references/report-template.md` — it defines + the full structure: executive summary, threat table, detailed findings + with evidence, attack chain analysis, detection coverage, and remediation. + +2. **Convert to PDF** (in preference order): + ```bash + npx md-to-pdf security-audit--.md # Best quality + pandoc security-audit--.md -o security-audit--.pdf + ``` + If neither tool is available, leave the markdown as the deliverable. + +3. **The final report must include findings from both sections** — Section A + (real-time) and Section B (snapshot dissection). Findings confirmed by + both sections are marked with higher confidence. Findings only in + Section B (missed by real-time) should be noted — this reveals gaps + in real-time dissection coverage. + +### Key Report Requirements + +- **Quote raw evidence** — actual DNS queries, HTTP URLs, SQL statements, + Redis commands. The reader must be able to verify without re-running. +- **Timestamp every finding** — snapshot ID + local time (UTC in parentheses). +- **Specific recommendations** — not "fix RBAC" but "revoke ClusterRoleBinding + `mule-recon-cluster-admin`". +- **Include MITRE ATT&CK IDs** for each finding. +- **Evidence preservation** — list snapshot IDs, recommend cloud storage upload. + +--- + +## What Network Auditing Cannot Detect + +Be transparent about blind spots. Network traffic analysis **cannot** detect: + +- **Configuration vulnerabilities**: Privileged containers, missing resource + limits, permissive RBAC, hostPath mounts — these are YAML-level issues with + no traffic signature +- **Secrets in environment variables**: Hardcoded credentials don't generate + network traffic until used +- **Image vulnerabilities**: CVEs in container images are not visible on the wire +- **Idle threats**: A malicious pod that hasn't started communicating yet + +Recommend `kubectl`-based configuration auditing for these gaps. Network +auditing is the complement, not the replacement, for config-level security +scanning. + +## Threat Intelligence Reference + +For detailed descriptions of all 22 network-observable threat scenarios with +MITRE ATT&CK mappings and detection guidance, see `references/threat-catalog.md`. diff --git a/skills/security-audit/references/kfl-security-filters.md b/skills/security-audit/references/kfl-security-filters.md new file mode 100644 index 000000000..c5973e806 --- /dev/null +++ b/skills/security-audit/references/kfl-security-filters.md @@ -0,0 +1,64 @@ +# KFL Quick Reference: Security Audit Filters + +## DNS Threat Hunting +``` +dns // All DNS traffic +dns && dns_response && size(dns_answers) == 0 // Failed lookups (NXDOMAIN — no answers) +dns && dns_questions.exists(q, q.contains("minexmr")) // Mining pool DNS +dns && dns_questions.exists(q, q.contains("nanopool")) // Mining pool DNS +dns && dns_questions.exists(q, q.contains("amazonaws")) // Cloud API resolution +dns && dns_questions.exists(q, q.contains("cloudflare-dns")) // DoH bypass +dns && dns_questions.exists(q, q.contains("dns.google")) // DoH bypass +``` + +## External Communication +``` +http && dst.name.contains("attacker") // Known-bad destinations +http && map_get(request.headers, "user-agent", "").contains("Mozilla/4.0") // Suspicious UA +http && map_get(request.headers, "accept", "").contains("dns-json") // DoH requests +http && map_get(request.headers, "upgrade", "") == "websocket" // WebSocket (potential mining) +``` + +## Lateral Movement +``` +src.pod.namespace != dst.pod.namespace // Cross-namespace traffic +http && path.startsWith("/api/v1/secrets") // Secret enumeration +http && path == "/.env" // Service fingerprinting +http && path == "/actuator/info" // Spring Boot fingerprinting +http && path == "/version" // Version fingerprinting +``` + +## Protocol Inspection +``` +postgresql // PostgreSQL wire protocol +postgresql && postgresql_query.contains("UNION SELECT") // SQL injection patterns +postgresql && !postgresql_success // Failed PostgreSQL queries +redis // Redis protocol +grpc // gRPC calls (native detection) +grpc && grpc_method.contains("Reflection") // gRPC reflection enumeration +``` + +## Credential Theft +``` +dst.ip == "169.254.169.254" // IMDS access +http && path.contains("/meta-data/iam") // IAM credential paths +http && map_get(request.headers, "authorization", "").startsWith("AWS4-HMAC-SHA256") // Stolen AWS creds +http && "x-aws-ec2-metadata-token-ttl-seconds" in request.headers // IMDSv2 token request +``` + +## Resource Hijacking +``` +dst.port == 3333 // Stratum mining (standard) +dst.port == 14433 // Stratum mining (alt) +dst.port == 45700 // Stratum mining (alt) +dst.port == 4444 // Reverse shell / backdoor +``` + +## Per-Namespace Scoping + +Add namespace filters to any query above: +``` +dns && src.pod.namespace == "k8s-mule" // DNS from specific namespace +http && src.pod.namespace == "k8s-mule" // HTTP from specific namespace +redis && src.pod.namespace == "k8s-mule" // Redis from specific namespace +``` diff --git a/skills/security-audit/references/report-template.md b/skills/security-audit/references/report-template.md new file mode 100644 index 000000000..a4c3dc120 --- /dev/null +++ b/skills/security-audit/references/report-template.md @@ -0,0 +1,102 @@ +# Security Audit Report Template + +Use this template for the markdown report. Fill in all sections, then convert +to PDF. + +```markdown +# Kubernetes Network Security Audit Report + +**Cluster**: +**Namespace**: +**Date**: +**Audit window**: () +**Snapshots analyzed**: +**Audited by**: Claude Code + Kubeshark MCP + +--- + +## Executive Summary + +<2-3 sentence summary: how many threats found, highest severity, +whether an active attack chain was identified, top recommendation> + +## Threat Summary + +| # | Severity | Workload | Threat | MITRE ATT&CK | +|---|----------|----------|--------|---------------| +| 1 | CRITICAL | log-shipper | DNS Tunneling | T1048.003 | +| 2 | CRITICAL | cloud-health-monitor | IMDS Credential Theft | T1552.005 | +| ... | | | | | + +## Detailed Findings + +### Finding 1: (CRITICAL) + +**Workload**: <pod name> +**MITRE ATT&CK**: <technique ID and name> +**Snapshot**: <snapshot ID> +**Detection method**: <which phase and tool detected this> + +**Evidence**: +<Specific traffic data — DNS queries, HTTP requests, L4 flows, +protocol payloads. Include timestamps, source/dest, and relevant +content. Quote actual query names, URLs, SQL statements, or +Redis commands observed.> + +**Impact**: +<What this means — data at risk, credentials exposed, scope of access> + +**Recommendation**: +<Specific remediation — NetworkPolicy, RBAC change, pod deletion, credential rotation> + +--- + +(repeat for each finding) + +## Attack Chain Analysis + +<If findings correlate, map the kill chain: +Initial Access → Reconnaissance → Credential Access → Lateral Movement → +Exfiltration → Persistence. Identify which workloads participate in each stage.> + +## Detection Coverage + +| Phase | Checked | Findings | +|-------|---------|----------| +| Workload Inventory | Yes | <count> | +| DNS Threat Analysis | Yes | <count> | +| External Communication | Yes | <count> | +| Lateral Movement | Yes | <count> | +| Protocol Abuse | Yes | <count> | +| Credential Access | Yes | <count> | + +## Limitations + +<What this audit cannot detect — config-level vulnerabilities, +image CVEs, idle threats. Recommend complementary tools.> + +## Immediate Actions + +1. <Highest priority action> +2. <Second priority> +3. ... + +## Evidence Preservation + +<List snapshot IDs created during this audit. Recommend uploading +to cloud storage for long-term retention. Include PCAP export +commands for key findings.> +``` + +## Quality Guidelines + +- **Include raw evidence** — quote actual DNS queries, HTTP URLs, SQL + statements, Redis commands. The reader should be able to verify findings + without re-running the audit. +- **Timestamp everything** — every finding should reference the snapshot ID + and timestamp (local time with UTC in parentheses). +- **Be specific in recommendations** — not "fix RBAC" but "revoke + ClusterRoleBinding `mule-recon-cluster-admin` and replace with a + namespace-scoped Role granting only `get` on `pods`". +- **Include MITRE ATT&CK IDs** — makes the report actionable for security + teams that track coverage against the framework. diff --git a/skills/security-audit/references/threat-catalog.md b/skills/security-audit/references/threat-catalog.md new file mode 100644 index 000000000..074ce05c5 --- /dev/null +++ b/skills/security-audit/references/threat-catalog.md @@ -0,0 +1,190 @@ +# Network Threat Catalog + +22 network-observable threat patterns organized by MITRE ATT&CK tactic. +Each entry describes the attack, what it looks like on the wire, and how +to detect it with Kubeshark. + +## Command & Control (TA0011) + +### DGA Beaconing (T1568.002) +- **What**: Malware generates pseudo-random domain names daily and queries DNS + for each. The C2 operator registers a few; most resolve to NXDOMAIN. +- **Wire signature**: Burst of DNS queries for high-entropy .com/.net domains + with >80% NXDOMAIN response rate. +- **KFL**: `dns && dns_response && size(dns_answers) == 0` — then check for entropy in queried names. +- **Difficulty**: Medium. NXDOMAIN flood is distinctive but low-rate DGA can + blend with legitimate DNS failures. + +### HTTP C2 Beaconing (T1071.001) +- **What**: Implant calls home via HTTP GET at regular intervals, receiving + tasking in the response body. Cobalt Strike, Meterpreter pattern. +- **Wire signature**: Periodic HTTP GET to fixed external URL at suspiciously + regular intervals (30-60s). Outdated User-Agent (Mozilla/4.0). Session + identifiers in URL path. +- **KFL**: `http && dst.name.contains("attacker")` or check for User-Agent anomalies. +- **Difficulty**: Medium. Regularity is the key anomaly. + +### Encrypted C2 (T1573.002) +- **What**: C2 over HTTPS. Content is encrypted but TLS SNI reveals suspicious + domain names. +- **Wire signature**: Outbound TLS to non-standard domains (darknet, cdn-mirror). + DNS queries preceding the connection reveal the target. +- **KFL**: `dns && (dns_questions.exists(q, q.contains("darknet")) || dns_questions.exists(q, q.contains("cdn-mirror")))`. +- **Difficulty**: Hard. Encrypted, uses standard port 443. + +### DNS-over-HTTPS C2 (T1572) +- **What**: Bypasses cluster DNS by sending queries as HTTPS to public DoH + resolvers (cloudflare-dns.com, dns.google). C2 commands embedded in TXT + responses. +- **Wire signature**: HTTP requests to DoH endpoints with `accept: application/dns-json` + header. No corresponding queries on port 53. +- **KFL**: `http && (dst.name.contains("cloudflare-dns") || dst.name.contains("dns.google"))`. +- **Difficulty**: Hard. Looks like regular HTTPS to trusted providers. + +## Exfiltration (TA0010) + +### DNS Tunneling (T1048.003) +- **What**: Full bidirectional data channel over DNS using tools like iodine, + dnscat2. Data encoded in long subdomain labels. +- **Wire signature**: High-frequency DNS queries (20+/burst) with subdomain + labels near 63-byte limit. Mix of A, TXT, NULL query types. +- **KFL**: `dns && dns_questions.exists(q, q.contains("data-relay"))` or look for + high query rates per source. +- **Difficulty**: Medium. Volume and long subdomains are distinctive. + +### HTTP Header Exfiltration (T1048.001) +- **What**: Data exfiltrated in HTTP headers (Cookie, X-Trace-ID) disguised + as analytics tracking. Low volume to evade detection. +- **Wire signature**: HTTP GET to analytics-looking URL with oversized Cookie + or custom headers containing base64-encoded data. +- **KFL**: `http && dst.name.contains("cdn-provider")`. +- **Difficulty**: Hard. Low volume, standard HTTP, looks like analytics. + +### DNS Credential Exfiltration (T1048.003) +- **What**: Stolen JWT tokens or credentials encoded in DNS TXT queries to + attacker-controlled authoritative nameserver. +- **Wire signature**: DNS TXT queries with structured multi-label subdomains + containing base64-like encoded data. +- **KFL**: `dns && dns_questions.exists(q, q.contains("steal-creds"))`. +- **Difficulty**: Medium. Multi-label structure is distinctive. + +### gRPC Stream Exfiltration (T1048.001) +- **What**: Data exfiltration via gRPC (HTTP/2) POST to external endpoint. + Blends with normal microservice traffic. +- **Wire signature**: HTTP/2 POST with `Content-Type: application/grpc` to + external destination with exfil-related method names. +- **KFL**: `grpc && dst.name.contains("attacker")`. +- **Difficulty**: Hard. gRPC is normal in K8s. External destination is the signal. + +## Lateral Movement (TA0008) + +### K8s API Enumeration (T1613) +- **What**: Compromised pod uses mounted service account token to enumerate + secrets, pods, RBAC bindings across all namespaces. +- **Wire signature**: HTTPS to kubernetes.default.svc with broad GET requests + across /api/v1/secrets, /pods, /configmaps, /clusterrolebindings. +- **KFL**: `http && dst.port == 443 && path.contains("/api/v1/secrets")`. +- **Difficulty**: Medium. The fanout across resource types is the anomaly. + +### SSRF to Internal Services (T1090) +- **What**: Pod probes cross-namespace internal services it shouldn't talk to — + kube-dns metrics, Prometheus, Grafana, dashboards. +- **Wire signature**: HTTP to multiple ClusterIP services across namespaces + from a single source pod. +- **KFL**: `http && src.pod.namespace == "k8s-mule" && dst.pod.namespace != "k8s-mule"`. +- **Difficulty**: Medium. Cross-namespace breadth is the signal. + +### Port Scanning (T1046) +- **What**: Sweep of common ports across pod CIDR after initial access. +- **Wire signature**: Rapid TCP SYN from single source to many IPs on ports + 80, 443, 3306, 5432, 6379, 8080, 9090, 27017. High RST/timeout rate. +- **KFL**: `tcp && src.name == "network-diagnostics"`. +- **Difficulty**: Easy. Classic scan pattern — high fan-out, high failure rate. + +### Service Fingerprinting (T1046) +- **What**: HTTP probes to discovery paths across multiple services to identify + running software. +- **Wire signature**: HTTP GET to /version, /healthz, /.env, /actuator/info, + /server-info. HEAD and OPTIONS methods. Multiple targets from one source. +- **KFL**: `http && (path == "/.env" || path == "/actuator/info")`. +- **Difficulty**: Medium. Path patterns are distinctive. + +## Credential Access (TA0006) + +### IMDS Metadata Theft (T1552.005) +- **What**: Query AWS/GCP instance metadata to steal IAM role credentials. + The Capital One breach vector. +- **Wire signature**: HTTP to 169.254.169.254 with paths /latest/meta-data/iam/, + /latest/user-data, /latest/api/token (PUT for IMDSv2). +- **KFL**: `dst.ip == "169.254.169.254"`. +- **Difficulty**: Easy. Destination IP is unique and unmistakable. + +### Cloud API Abuse (T1078.004) +- **What**: Direct calls to AWS APIs (STS, S3, EC2) with stolen credentials + from a workload pod. +- **Wire signature**: DNS for sts.amazonaws.com, s3.amazonaws.com. HTTPS + requests with AWS Signature V4 Authorization headers. +- **KFL**: `dns && dns_questions.exists(q, q.contains("amazonaws.com"))`. +- **Difficulty**: Medium. Cloud API DNS from a non-controller pod is suspicious. + +## Resource Hijacking (TA0040) + +### Stratum Mining Protocol (T1496) +- **What**: XMRig/miner connecting to mining pool via Stratum JSON-RPC over TCP. +- **Wire signature**: TCP connection to port 3333/14433/45700 with JSON-RPC + messages: mining.subscribe, mining.authorize, mining.submit. +- **KFL**: `dst.port == 3333`. +- **Difficulty**: Medium. Port 3333 is a well-known mining indicator. + +### Mining Pool DNS (T1496) +- **What**: DNS resolution of known mining pool domains before connecting. +- **Wire signature**: DNS queries for domains containing minexmr, nanopool, + mining-pool, hashvault, supportxmr. +- **KFL**: `dns && (dns_questions.exists(q, q.contains("minexmr")) || dns_questions.exists(q, q.contains("mining-pool")))`. +- **Difficulty**: Easy. Mining domain names are unmistakable. + +### WebSocket Mining (T1496) +- **What**: Browser-based miner communicating via WebSocket on standard ports. +- **Wire signature**: HTTP Upgrade: websocket request to external host with + mining-related URL path (/proxy?coin=, ?algo=randomx). +- **KFL**: `http && map_get(request.headers, "upgrade", "") == "websocket"`. +- **Difficulty**: Hard. WebSocket on port 80/443 looks normal. Only URL reveals intent. + +## Protocol Abuse + +### SQL Injection via PG Wire (T1190) +- **What**: SQL injection payloads sent through PostgreSQL wire protocol. +- **Wire signature**: PG protocol carrying UNION SELECT, information_schema, + pg_shadow queries. +- **KFL**: `postgresql`. +- **Difficulty**: Medium. PG dissection reveals the SQL content directly. + +### Redis Unauthorized Access (T1190) +- **What**: Unauthenticated Redis instance probed with dangerous commands. +- **Wire signature**: Redis protocol: CONFIG GET *, KEYS *, CLIENT LIST, DEBUG. +- **KFL**: `redis`. +- **Difficulty**: Easy. Redis command names are directly visible. + +### Database Destruction (T1485) +- **What**: Ransomware pattern — SELECT * (data theft) then TRUNCATE/DROP (destruction). +- **Wire signature**: PG protocol showing SELECT followed by TRUNCATE on same table. +- **KFL**: `postgresql`. +- **Difficulty**: Medium. DDL commands in PG protocol are visible with dissection. + +## Reconnaissance (TA0043) + +### DNS Zone Enumeration (T1018) +- **What**: Brute-force DNS queries across namespaces to discover services. + Includes SRV lookups and AXFR zone transfer attempts. +- **Wire signature**: High volume of DNS queries for *.svc.cluster.local patterns + across many namespaces. Many NXDOMAIN responses. +- **KFL**: `dns && src.name == "service-discovery"`. +- **Difficulty**: Easy. Volume and cross-namespace pattern is obvious. + +### gRPC Reflection Enumeration (T1046) +- **What**: Probing gRPC server reflection to discover API surfaces without + needing proto files. +- **Wire signature**: HTTP/2 POST to /grpc.reflection.v1alpha.ServerReflection/ + ServerReflectionInfo across multiple services. +- **KFL**: `grpc && grpc_method.contains("Reflection")` or `http && path.contains("grpc.reflection")`. +- **Difficulty**: Medium. Reflection path is a known enumeration vector.