Compare commits

...

12 Commits

Author SHA1 Message Date
Alon Girmonsky
37f8becb7b Update label examples: direct access and map_get, note indexed queries 2026-03-18 17:11:35 -07:00
Alon Girmonsky
10dbedf356 Add KFL and Network RCA skills (#1875)
* Add KFL and Network RCA skills

Introduce the skills/ directory with two Kubeshark MCP skills:

- network-rca: Retrospective traffic analysis via snapshots, dissection,
  KFL queries, PCAP extraction, and trend comparison
- kfl: Complete KFL2 (Kubeshark Filter Language) reference covering all
  supported protocols, variables, operators, and filter patterns

Update CLAUDE.md with skill authoring guidelines, structure conventions,
and the list of available Kubeshark MCP tools.

* Optimize skills and add shared setup reference

- network-rca: cut repeated metaphor, add list_api_calls example response,
  consolidate use cases, remove unbuilt composability section, extract
  setup reference to references/setup.md (409 → 306 lines)
- kfl: merge thin protocol sections, fix map_get inconsistency, add
  negation examples, move capture source to reference doc
- kfl2-reference: add most-commonly-used variables table, add inline
  filter examples per protocol section
- Add skills/README.md with usage and contribution guidelines

* Add plugin infrastructure and update READMEs

- Add .claude-plugin/plugin.json and marketplace.json for Claude Code
  plugin distribution
- Add .mcp.json bundling the Kubeshark MCP configuration
- Update skills/README.md with plugin install, manual install, and
  agent compatibility sections
- Update mcp/README.md with AI skills section and install instructions
- Restructure network-rca skill into two distinct investigation routes:
  PCAP (no dissection, BPF filters, Wireshark/compliance) and
  Dissection (indexed queries, AI-driven analysis, payload inspection)

* Remove CLAUDE.md from tracked files

Content now lives in skills/README.md, mcp/README.md, and the skills themselves.

* Add README to .claude-plugin directory

* Reorder MCP config: default mode first, URL mode for no-kubectl

* Move AI Skills section to top of MCP README

* Reorder manual install: symlink first

* Streamline skills README: focus on usage and contributing

* Enforce KFL skill loading before writing filters

- network-rca: require loading KFL skill before constructing filters,
  suggest installation if unavailable
- kfl: set user-invocable: false (background knowledge skill), strengthen
  description to mandate loading before any filter construction

* Move KFL requirement to top of Dissection route

* Add strict fallback: only use exact examples if KFL skill unavailable

* Add clone step to manual installation

* Use $PWD/kubeshark paths in manual install examples

* Add mkdir before symlinks, simplify paths

* Move prerequisites before installation

---------

Co-authored-by: Alon Girmonsky <alongir@Alons-Mac-Studio.local>
2026-03-18 15:31:32 -07:00
Serhii Ponomarenko
963b3e4ac2 🐛 Add default value for demoModeEnabled (#1872) 2026-03-17 13:22:42 -07:00
Volodymyr Stoiko
b2813e02bd Add detailed docs for kubeshark irsa setup (#1871)
Co-authored-by: Alon Girmonsky <1990761+alongir@users.noreply.github.com>
2026-03-16 20:29:49 -07:00
Serhii Ponomarenko
707d7351b6 🛂 Demo Portal: Readonly mode / no authn (#1869)
* 🔨 Add snapshots-updating-enabled `front` env

* 🔨 Add snapshots-updating-enabled config

* 🔨 Add demo-enabled `front` env

* 🔨 Add demo-enabled config

* 🔨 Replace `liveConfigMapChangesDisabled` with `demoModeEnabled` flag

* 🐛 Fix dissection-control-enabled env logic

* 🦺 Handle nullish `demoModeEnabled` value
2026-03-16 20:01:18 -07:00
Serhii Ponomarenko
23c86be773 🛂 Control L4 map visibility (helm value) (#1866)
* 🔨 Add `tap.dashboard.clusterWideMapEnabled` helm value

* 🔨 Add cluster-wide-map-enabled `front` env

* 🔨 Add fallback value for `front` env
2026-03-11 15:36:20 -07:00
Alon Girmonsky
3f8a067f9b Update README: Network Observability for SREs & AI Agents (#1861)
* Update README hero: Network Observability for SREs & AI Agents

Rewrite hero section to focus on cluster-wide network data
consolidation and dual access model (AI agents via MCP,
human operators via dashboard).

* Add MCP demo GIF to README hero section

Replace static stream.png with animated MCP demo showing
Claude Code + Kubeshark workflow.

* Reorder README sections and add MCP demo GIF

- Hero description + stream.png first
- Get Started section
- AI-Powered Network Analysis with MCP demo GIF
- L7 API Dissection
- L4/L7 Workload Map
- Traffic Retention
- Features, Install, Contributing, License

* Reference MCP demo GIF by commit SHA for preview

* Update MCP demo GIF reference to assets master
2026-03-09 08:29:52 -07:00
Volodymyr Stoiko
33f5310e8e Add gcs cloudstorage configuration docs (#1862)
* add gcs docs

* add explicit gcs keys

* gcs helm tests

* add iam permissions docs for gcs

* Update gcs docs with exact setup steps for workload identity
2026-03-09 07:48:17 -07:00
Alon Girmonsky
5f2f34e826 Sync helm-chart README with current values.yaml (#1856)
Update configuration table to match actual defaults in values.yaml:

- tap.storageLimit: 5Gi → 10Gi
- tap.capture.dbMaxSize: "" → 500Mi
- tap.resources.sniffer/tracer.limits.memory: 3Gi → 5Gi
- tap.probes.hub/sniffer initialDelaySeconds: 15 → 5
- tap.probes.hub/sniffer periodSeconds: 10 → 5
- tap.dnsConfig.* → tap.dns.* (match yaml tag)
- tap.sentry.enabled: true → false

Add missing entries:
- tap.capture.captureSelf
- tap.delayedDissection.cpu/memory
- tap.packetCapture
- tap.misc.trafficSampleRate
- tap.misc.tcpStreamChannelTimeoutMs

Remove stale KernelMapping text.
2026-03-06 11:52:10 -08:00
Volodymyr Stoiko
f9a5fbbb78 Fix snapshots local storage size (#1859)
Co-authored-by: Alon Girmonsky <1990761+alongir@users.noreply.github.com>
2026-03-06 08:33:59 -08:00
Volodymyr Stoiko
73f8e3585d Cloud storage explicit config (#1858)
* Add explicit configs

* Add helm unit tests

* fixpipeline

* latest

---------

Co-authored-by: Alon Girmonsky <1990761+alongir@users.noreply.github.com>
2026-03-06 08:27:08 -08:00
Alon Girmonsky
a6daefc567 Fix MCP Registry publish by using OIDC auth instead of interactive OAuth (#1857)
mcp-publisher login github uses the device flow (interactive OAuth) which
requires a human to visit a URL - this can never work in CI. Switch to
github-oidc which uses the OIDC token provided by GitHub Actions.
2026-03-06 08:04:26 -08:00
29 changed files with 2474 additions and 110 deletions

33
.claude-plugin/README.md Normal file
View File

@@ -0,0 +1,33 @@
# Kubeshark Claude Code Plugin
This directory contains the [Claude Code plugin](https://docs.anthropic.com/en/docs/claude-code/plugins) configuration for Kubeshark.
## What's here
| File | Purpose |
|------|---------|
| `plugin.json` | Plugin manifest — name, version, description, metadata |
| `marketplace.json` | Marketplace index — allows discovery via `/plugin marketplace add` |
## Installing the plugin
```
/plugin marketplace add kubeshark/kubeshark
/plugin install kubeshark
```
This loads the Kubeshark AI skills and MCP configuration. Skills appear as
`/kubeshark:network-rca` and `/kubeshark:kfl`.
## What the plugin includes
- **Skills** from [`skills/`](../skills/) — network root cause analysis and KFL filter expertise
- **MCP configuration** from [`.mcp.json`](../.mcp.json) — connects to the Kubeshark MCP server
## Local development
Test the plugin without installing:
```bash
claude --plugin-dir /path/to/kubeshark
```

View File

@@ -0,0 +1,15 @@
{
"name": "kubeshark",
"description": "Kubeshark network observability skills for Kubernetes",
"plugins": [
{
"name": "kubeshark",
"description": "Network observability skills powered by Kubeshark MCP — root cause analysis, KFL traffic filtering, snapshot forensics, PCAP extraction.",
"source": {
"source": "github",
"owner": "kubeshark",
"repo": "kubeshark"
}
}
]
}

View File

@@ -0,0 +1,24 @@
{
"name": "kubeshark",
"version": "1.0.0",
"description": "Kubernetes network observability skills powered by Kubeshark MCP. Root cause analysis, traffic filtering, snapshot forensics, PCAP extraction, and more.",
"author": {
"name": "Kubeshark",
"url": "https://kubeshark.com"
},
"homepage": "https://kubeshark.com",
"repository": "https://github.com/kubeshark/kubeshark",
"license": "Apache-2.0",
"keywords": [
"kubeshark",
"kubernetes",
"network",
"observability",
"traffic",
"mcp",
"rca",
"pcap",
"kfl",
"ebpf"
]
}

View File

@@ -168,7 +168,7 @@ jobs:
- name: Login to MCP Registry
if: github.event_name != 'workflow_dispatch' || github.event.inputs.dry_run != 'true'
shell: bash
run: mcp-publisher login github
run: mcp-publisher login github-oidc
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -15,7 +15,7 @@ jobs:
timeout-minutes: 20
steps:
- name: Check out code into the Go module directory
uses: actions/checkout@v3
uses: actions/checkout@v5
with:
fetch-depth: 2
@@ -29,3 +29,52 @@ jobs:
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
helm-tests:
name: Helm Chart Tests
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- name: Check out code
uses: actions/checkout@v5
- name: Set up Helm
uses: azure/setup-helm@v4
- name: Helm lint (default values)
run: helm lint ./helm-chart
- name: Helm lint (S3 values)
run: helm lint ./helm-chart -f ./helm-chart/tests/fixtures/values-s3.yaml
- name: Helm lint (Azure Blob values)
run: helm lint ./helm-chart -f ./helm-chart/tests/fixtures/values-azblob.yaml
- name: Helm lint (GCS values)
run: helm lint ./helm-chart -f ./helm-chart/tests/fixtures/values-gcs.yaml
- name: Helm lint (cloud refs values)
run: helm lint ./helm-chart -f ./helm-chart/tests/fixtures/values-cloud-refs.yaml
- name: Install helm-unittest plugin
run: helm plugin install https://github.com/helm-unittest/helm-unittest --verify=false
- name: Run helm unit tests
run: helm unittest ./helm-chart
- name: Install kubeconform
run: |
curl -sL https://github.com/yannh/kubeconform/releases/latest/download/kubeconform-linux-amd64.tar.gz | tar xz
sudo mv kubeconform /usr/local/bin/
- name: Validate default template
run: helm template kubeshark ./helm-chart | kubeconform -strict -kubernetes-version 1.35.0 -summary
- name: Validate S3 template
run: helm template kubeshark ./helm-chart -f ./helm-chart/tests/fixtures/values-s3.yaml | kubeconform -strict -kubernetes-version 1.35.0 -summary
- name: Validate Azure Blob template
run: helm template kubeshark ./helm-chart -f ./helm-chart/tests/fixtures/values-azblob.yaml | kubeconform -strict -kubernetes-version 1.35.0 -summary
- name: Validate GCS template
run: helm template kubeshark ./helm-chart -f ./helm-chart/tests/fixtures/values-gcs.yaml | kubeconform -strict -kubernetes-version 1.35.0 -summary

8
.mcp.json Normal file
View File

@@ -0,0 +1,8 @@
{
"mcpServers": {
"kubeshark": {
"command": "kubeshark",
"args": ["mcp"]
}
}
}

View File

@@ -137,6 +137,16 @@ test-integration-short: ## Run quick integration tests (skips long-running tests
rm -f $$LOG_FILE; \
exit $$status
helm-test: ## Run Helm lint and unit tests.
helm lint ./helm-chart
helm unittest ./helm-chart
helm-test-full: helm-test ## Run Helm tests with kubeconform schema validation.
helm template kubeshark ./helm-chart | kubeconform -strict -kubernetes-version 1.35.0 -summary
helm template kubeshark ./helm-chart -f ./helm-chart/tests/fixtures/values-s3.yaml | kubeconform -strict -kubernetes-version 1.35.0 -summary
helm template kubeshark ./helm-chart -f ./helm-chart/tests/fixtures/values-azblob.yaml | kubeconform -strict -kubernetes-version 1.35.0 -summary
helm template kubeshark ./helm-chart -f ./helm-chart/tests/fixtures/values-gcs.yaml | kubeconform -strict -kubernetes-version 1.35.0 -summary
lint: ## Lint the source code.
golangci-lint run

View File

@@ -9,7 +9,7 @@
<a href="https://join.slack.com/t/kubeshark/shared_invite/zt-3jdcdgxdv-1qNkhBh9c6CFoE7bSPkpBQ"><img alt="Slack" src="https://img.shields.io/badge/slack-join_chat-green?logo=Slack&style=flat-square"></a>
</p>
<p align="center"><b>Network Intelligence for Kubernetes</b></p>
<p align="center"><b>Network Observability for SREs & AI Agents</b></p>
<p align="center">
<a href="https://demo.kubeshark.com/">Live Demo</a> · <a href="https://docs.kubeshark.com">Docs</a>
@@ -17,9 +17,17 @@
---
* **Cluster-wide, real-time visibility into every packet, API call, and service interaction.**
* Replay any moment in time.
* Resolve incidents at the speed of LLMs. 100% on-premises.
Kubeshark captures cluster-wide network traffic at the speed and scale of Kubernetes, continuously, at the kernel level using eBPF. It consolidates a highly fragmented picture — dozens of nodes, thousands of workloads, millions of connections — into a single, queryable view with full Kubernetes and API context.
Network data is available to **AI agents via [MCP](https://docs.kubeshark.com/en/mcp)** and to **human operators via a [dashboard](https://docs.kubeshark.com/en/v2)**.
**What's captured, cluster-wide:**
- **L4 Packets & TCP Metrics** — retransmissions, RTT, window saturation, connection lifecycle, packet loss across every node-to-node path ([TCP insights →](https://docs.kubeshark.com/en/mcp/tcp_insights))
- **L7 API Calls** — real-time request/response matching with full payload parsing: HTTP, gRPC, GraphQL, Redis, Kafka, DNS ([API dissection →](https://docs.kubeshark.com/en/v2/l7_api_dissection))
- **Decrypted TLS** — eBPF-based TLS decryption without key management
- **Kubernetes Context** — every packet and API call resolved to pod, service, namespace, and node
- **PCAP Retention** — point-in-time raw packet snapshots, exportable for Wireshark ([Snapshots →](https://docs.kubeshark.com/en/v2/traffic_snapshots))
![Kubeshark](https://github.com/kubeshark/assets/raw/master/png/stream.png)
@@ -34,33 +42,37 @@ helm install kubeshark kubeshark/kubeshark
Dashboard opens automatically. You're capturing traffic.
**With AI** — connect your assistant and debug with natural language:
**Connect an AI agent** via MCP:
```bash
brew install kubeshark
claude mcp add kubeshark -- kubeshark mcp
```
[MCP setup guide →](https://docs.kubeshark.com/en/mcp)
---
### AI-Powered Network Analysis
Kubeshark exposes all cluster-wide network data via MCP (Model Context Protocol). AI agents can query L4 metrics, investigate L7 API calls, analyze traffic patterns, and run root cause analysis — through natural language. Use cases include incident response, root cause analysis, troubleshooting, debugging, and reliability workflows.
> *"Why did checkout fail at 2:15 PM?"*
> *"Which services have error rates above 1%?"*
> *"Show TCP retransmission rates across all node-to-node paths"*
> *"Trace request abc123 through all services"*
Works with Claude Code, Cursor, and any MCP-compatible AI.
![MCP Demo](https://github.com/kubeshark/assets/raw/master/gif/mcp-demo.gif)
[MCP setup guide →](https://docs.kubeshark.com/en/mcp)
---
## Why Kubeshark
### L7 API Dissection
- **Instant root cause** — trace requests across services, see exact errors
- **Zero instrumentation** — no code changes, no SDKs, just deploy
- **Full payload capture** — request/response bodies, headers, timing
- **TLS decryption** — see encrypted traffic without managing keys
- **AI-ready** — query traffic with natural language via MCP
---
### Traffic Analysis and API Dissection
Capture and inspect every API call across your cluster—HTTP, gRPC, Redis, Kafka, DNS, and more. Request/response matching with full payloads, parsed according to protocol specifications. Headers, timing, and complete context. Zero instrumentation required.
Cluster-wide request/response matching with full payloads, parsed according to protocol specifications. HTTP, gRPC, Redis, Kafka, DNS, and more. Every API call resolved to source and destination pod, service, namespace, and node. No code instrumentation required.
![API context](https://github.com/kubeshark/assets/raw/master/png/api_context.png)
@@ -68,27 +80,15 @@ Capture and inspect every API call across your cluster—HTTP, gRPC, Redis, Kafk
### L4/L7 Workload Map
Visualize how your services communicate. See dependencies, traffic flow, and identify anomalies at a glance.
Cluster-wide view of service communication: dependencies, traffic flow, and anomalies across all nodes and namespaces.
![Service Map](https://github.com/kubeshark/assets/raw/master/png/servicemap.png)
[Learn more →](https://docs.kubeshark.com/en/v2/service_map)
### AI-Powered Root Cause Analysis
Resolve production issues in minutes instead of hours. Connect your AI assistant and investigate incidents using natural language. Build network-aware AI agents for forensics, monitoring, compliance, and security.
> *"Why did checkout fail at 2:15 PM?"*
> *"Which services have error rates above 1%?"*
> *"Trace request abc123 through all services"*
Works with Claude Code, Cursor, and any MCP-compatible AI.
[MCP setup guide →](https://docs.kubeshark.com/en/mcp)
### Traffic Retention
Retain every packet. Take snapshots. Export PCAP files. Replay any moment in time.
Continuous raw packet capture with point-in-time snapshots. Export PCAP files for offline analysis with Wireshark or other tools.
![Traffic Retention](https://github.com/kubeshark/assets/raw/master/png/snapshots.png)
@@ -105,7 +105,7 @@ Retain every packet. Take snapshots. Export PCAP files. Replay any moment in tim
| [**L7 API Dissection**](https://docs.kubeshark.com/en/v2/l7_api_dissection) | Request/response matching with full payloads and protocol parsing |
| [**Protocol Support**](https://docs.kubeshark.com/en/protocols) | HTTP, gRPC, GraphQL, Redis, Kafka, DNS, and more |
| [**TLS Decryption**](https://docs.kubeshark.com/en/encrypted_traffic) | eBPF-based decryption without key management |
| [**AI-Powered Analysis**](https://docs.kubeshark.com/en/v2/ai_powered_analysis) | Query traffic with Claude, Cursor, or any MCP-compatible AI |
| [**AI-Powered Analysis**](https://docs.kubeshark.com/en/v2/ai_powered_analysis) | Query cluster-wide network data with Claude, Cursor, or any MCP-compatible AI |
| [**Display Filters**](https://docs.kubeshark.com/en/v2/kfl2) | Wireshark-inspired display filters for precise traffic analysis |
| [**100% On-Premises**](https://docs.kubeshark.com/en/air_gapped) | Air-gapped support, no external dependencies |

View File

@@ -153,6 +153,7 @@ func CreateDefaultConfig() ConfigStruct {
},
Dashboard: configStructs.DashboardConfig{
CompleteStreamingEnabled: true,
ClusterWideMapEnabled: false,
},
Capture: configStructs.CaptureConfig{
Dissection: configStructs.DissectionConfig{

View File

@@ -202,6 +202,7 @@ type RoutingConfig struct {
type DashboardConfig struct {
StreamingType string `yaml:"streamingType" json:"streamingType" default:"connect-rpc"`
CompleteStreamingEnabled bool `yaml:"completeStreamingEnabled" json:"completeStreamingEnabled" default:"true"`
ClusterWideMapEnabled bool `yaml:"clusterWideMapEnabled" json:"clusterWideMapEnabled" default:"false"`
}
type FrontRoutingConfig struct {
@@ -209,9 +210,9 @@ type FrontRoutingConfig struct {
}
type ReleaseConfig struct {
Repo string `yaml:"repo" json:"repo" default:"https://helm.kubeshark.com"`
Name string `yaml:"name" json:"name" default:"kubeshark"`
Namespace string `yaml:"namespace" json:"namespace" default:"default"`
Repo string `yaml:"repo" json:"repo" default:"https://helm.kubeshark.com"`
Name string `yaml:"name" json:"name" default:"kubeshark"`
Namespace string `yaml:"namespace" json:"namespace" default:"default"`
HelmChartPath string `yaml:"helmChartPath" json:"helmChartPath" default:""`
}
@@ -315,10 +316,35 @@ type SnapshotsLocalConfig struct {
StorageSize string `yaml:"storageSize" json:"storageSize" default:"20Gi"`
}
type SnapshotsCloudS3Config struct {
Bucket string `yaml:"bucket" json:"bucket" default:""`
Region string `yaml:"region" json:"region" default:""`
AccessKey string `yaml:"accessKey" json:"accessKey" default:""`
SecretKey string `yaml:"secretKey" json:"secretKey" default:""`
RoleArn string `yaml:"roleArn" json:"roleArn" default:""`
ExternalId string `yaml:"externalId" json:"externalId" default:""`
}
type SnapshotsCloudAzblobConfig struct {
StorageAccount string `yaml:"storageAccount" json:"storageAccount" default:""`
Container string `yaml:"container" json:"container" default:""`
StorageKey string `yaml:"storageKey" json:"storageKey" default:""`
}
type SnapshotsCloudGCSConfig struct {
Bucket string `yaml:"bucket" json:"bucket" default:""`
Project string `yaml:"project" json:"project" default:""`
CredentialsJson string `yaml:"credentialsJson" json:"credentialsJson" default:""`
}
type SnapshotsCloudConfig struct {
Provider string `yaml:"provider" json:"provider" default:""`
ConfigMaps []string `yaml:"configMaps" json:"configMaps" default:"[]"`
Secrets []string `yaml:"secrets" json:"secrets" default:"[]"`
Provider string `yaml:"provider" json:"provider" default:""`
Prefix string `yaml:"prefix" json:"prefix" default:""`
ConfigMaps []string `yaml:"configMaps" json:"configMaps" default:"[]"`
Secrets []string `yaml:"secrets" json:"secrets" default:"[]"`
S3 SnapshotsCloudS3Config `yaml:"s3" json:"s3"`
Azblob SnapshotsCloudAzblobConfig `yaml:"azblob" json:"azblob"`
GCS SnapshotsCloudGCSConfig `yaml:"gcs" json:"gcs"`
}
type SnapshotsConfig struct {
@@ -386,7 +412,6 @@ type TapConfig struct {
Gitops GitopsConfig `yaml:"gitops" json:"gitops"`
Sentry SentryConfig `yaml:"sentry" json:"sentry"`
DefaultFilter string `yaml:"defaultFilter" json:"defaultFilter" default:""`
LiveConfigMapChangesDisabled bool `yaml:"liveConfigMapChangesDisabled" json:"liveConfigMapChangesDisabled" default:"false"`
GlobalFilter string `yaml:"globalFilter" json:"globalFilter" default:""`
EnabledDissectors []string `yaml:"enabledDissectors" json:"enabledDissectors"`
PortMapping PortMapping `yaml:"portMapping" json:"portMapping"`

View File

@@ -142,12 +142,28 @@ Example for overriding image names:
| `tap.capture.dissection.stopAfter` | Set to a duration (e.g. `30s`) to have L7 dissection stop after no activity. | `5m` |
| `tap.capture.raw.enabled` | Enable raw capture of packets and syscalls to disk for offline analysis | `true` |
| `tap.capture.raw.storageSize` | Maximum storage size for raw capture files (supports K8s quantity format: `1Gi`, `500Mi`, etc.) | `1Gi` |
| `tap.capture.dbMaxSize` | Maximum size for capture database (e.g., `4Gi`, `2000Mi`). When empty, automatically uses 80% of allocated storage (`tap.storageLimit`). | `""` |
| `tap.capture.captureSelf` | Include Kubeshark's own traffic in capture | `false` |
| `tap.capture.dbMaxSize` | Maximum size for capture database (e.g., `4Gi`, `2000Mi`). | `500Mi` |
| `tap.snapshots.local.storageClass` | Storage class for local snapshots volume. When empty, uses `emptyDir`. When set, creates a PVC with this storage class | `""` |
| `tap.snapshots.local.storageSize` | Storage size for local snapshots volume (supports K8s quantity format: `1Gi`, `500Mi`, etc.) | `20Gi` |
| `tap.snapshots.cloud.provider` | Cloud storage provider for snapshots: `s3` or `azblob`. Empty string disables cloud storage. See [Cloud Storage docs](docs/snapshots_cloud_storage.md). | `""` |
| `tap.snapshots.cloud.configMaps` | Names of ConfigMaps containing cloud storage environment variables. See [Cloud Storage docs](docs/snapshots_cloud_storage.md). | `[]` |
| `tap.snapshots.cloud.secrets` | Names of Secrets containing cloud storage credentials. See [Cloud Storage docs](docs/snapshots_cloud_storage.md). | `[]` |
| `tap.snapshots.cloud.provider` | Cloud storage provider for snapshots: `s3`, `azblob`, or `gcs`. Empty string disables cloud storage. See [Cloud Storage docs](docs/snapshots_cloud_storage.md). | `""` |
| `tap.snapshots.cloud.prefix` | Key prefix in the bucket/container (e.g. `snapshots/`). See [Cloud Storage docs](docs/snapshots_cloud_storage.md). | `""` |
| `tap.snapshots.cloud.configMaps` | Names of pre-existing ConfigMaps with cloud storage env vars. Alternative to inline `s3`/`azblob`/`gcs` values below. See [Cloud Storage docs](docs/snapshots_cloud_storage.md). | `[]` |
| `tap.snapshots.cloud.secrets` | Names of pre-existing Secrets with cloud storage credentials. Alternative to inline `s3`/`azblob`/`gcs` values below. See [Cloud Storage docs](docs/snapshots_cloud_storage.md). | `[]` |
| `tap.snapshots.cloud.s3.bucket` | S3 bucket name. When set, the chart auto-creates a ConfigMap with `SNAPSHOT_AWS_BUCKET`. | `""` |
| `tap.snapshots.cloud.s3.region` | AWS region for the S3 bucket. | `""` |
| `tap.snapshots.cloud.s3.accessKey` | AWS access key ID. When set, the chart auto-creates a Secret with `SNAPSHOT_AWS_ACCESS_KEY`. | `""` |
| `tap.snapshots.cloud.s3.secretKey` | AWS secret access key. When set, the chart auto-creates a Secret with `SNAPSHOT_AWS_SECRET_KEY`. | `""` |
| `tap.snapshots.cloud.s3.roleArn` | IAM role ARN to assume via STS for cross-account S3 access. | `""` |
| `tap.snapshots.cloud.s3.externalId` | External ID for the STS AssumeRole call. | `""` |
| `tap.snapshots.cloud.azblob.storageAccount` | Azure storage account name. When set, the chart auto-creates a ConfigMap with `SNAPSHOT_AZBLOB_STORAGE_ACCOUNT`. | `""` |
| `tap.snapshots.cloud.azblob.container` | Azure blob container name. | `""` |
| `tap.snapshots.cloud.azblob.storageKey` | Azure storage account access key. When set, the chart auto-creates a Secret with `SNAPSHOT_AZBLOB_STORAGE_KEY`. | `""` |
| `tap.snapshots.cloud.gcs.bucket` | GCS bucket name. When set, the chart auto-creates a ConfigMap with `SNAPSHOT_GCS_BUCKET`. | `""` |
| `tap.snapshots.cloud.gcs.project` | GCP project ID. | `""` |
| `tap.snapshots.cloud.gcs.credentialsJson` | Service account JSON key. When set, the chart auto-creates a Secret with `SNAPSHOT_GCS_CREDENTIALS_JSON`. | `""` |
| `tap.delayedDissection.cpu` | CPU allocation for delayed dissection jobs | `1` |
| `tap.delayedDissection.memory` | Memory allocation for delayed dissection jobs | `4Gi` |
| `tap.release.repo` | URL of the Helm chart repository | `https://helm.kubeshark.com` |
| `tap.release.name` | Helm release name | `kubeshark` |
| `tap.release.namespace` | Helm release namespace | `default` |
@@ -155,30 +171,30 @@ Example for overriding image names:
| `tap.persistentStorageStatic` | Use static persistent volume provisioning (explicitly defined `PersistentVolume` ) | `false` |
| `tap.persistentStoragePvcVolumeMode` | Set the pvc volume mode (Filesystem\|Block) | `Filesystem` |
| `tap.efsFileSytemIdAndPath` | [EFS file system ID and, optionally, subpath and/or access point](https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/master/examples/kubernetes/access_points/README.md) `<FileSystemId>:<Path>:<AccessPointId>` | "" |
| `tap.storageLimit` | Limit of either the `emptyDir` or `persistentVolumeClaim` | `5Gi` |
| `tap.storageLimit` | Limit of either the `emptyDir` or `persistentVolumeClaim` | `10Gi` |
| `tap.storageClass` | Storage class of the `PersistentVolumeClaim` | `standard` |
| `tap.dryRun` | Preview of all pods matching the regex, without tapping them | `false` |
| `tap.dnsConfig.nameservers` | Nameservers to use for DNS resolution | `[]` |
| `tap.dnsConfig.searches` | Search domains to use for DNS resolution | `[]` |
| `tap.dnsConfig.options` | DNS options to use for DNS resolution | `[]` |
| `tap.dns.nameservers` | Nameservers to use for DNS resolution | `[]` |
| `tap.dns.searches` | Search domains to use for DNS resolution | `[]` |
| `tap.dns.options` | DNS options to use for DNS resolution | `[]` |
| `tap.resources.hub.limits.cpu` | CPU limit for hub | `""` (no limit) |
| `tap.resources.hub.limits.memory` | Memory limit for hub | `5Gi` |
| `tap.resources.hub.requests.cpu` | CPU request for hub | `50m` |
| `tap.resources.hub.requests.memory` | Memory request for hub | `50Mi` |
| `tap.resources.sniffer.limits.cpu` | CPU limit for sniffer | `""` (no limit) |
| `tap.resources.sniffer.limits.memory` | Memory limit for sniffer | `3Gi` |
| `tap.resources.sniffer.limits.memory` | Memory limit for sniffer | `5Gi` |
| `tap.resources.sniffer.requests.cpu` | CPU request for sniffer | `50m` |
| `tap.resources.sniffer.requests.memory` | Memory request for sniffer | `50Mi` |
| `tap.resources.tracer.limits.cpu` | CPU limit for tracer | `""` (no limit) |
| `tap.resources.tracer.limits.memory` | Memory limit for tracer | `3Gi` |
| `tap.resources.tracer.limits.memory` | Memory limit for tracer | `5Gi` |
| `tap.resources.tracer.requests.cpu` | CPU request for tracer | `50m` |
| `tap.resources.tracer.requests.memory` | Memory request for tracer | `50Mi` |
| `tap.probes.hub.initialDelaySeconds` | Initial delay before probing the hub | `15` |
| `tap.probes.hub.periodSeconds` | Period between probes for the hub | `10` |
| `tap.probes.hub.initialDelaySeconds` | Initial delay before probing the hub | `5` |
| `tap.probes.hub.periodSeconds` | Period between probes for the hub | `5` |
| `tap.probes.hub.successThreshold` | Number of successful probes before considering the hub healthy | `1` |
| `tap.probes.hub.failureThreshold` | Number of failed probes before considering the hub unhealthy | `3` |
| `tap.probes.sniffer.initialDelaySeconds` | Initial delay before probing the sniffer | `15` |
| `tap.probes.sniffer.periodSeconds` | Period between probes for the sniffer | `10` |
| `tap.probes.sniffer.initialDelaySeconds` | Initial delay before probing the sniffer | `5` |
| `tap.probes.sniffer.periodSeconds` | Period between probes for the sniffer | `5` |
| `tap.probes.sniffer.successThreshold` | Number of successful probes before considering the sniffer healthy | `1` |
| `tap.probes.sniffer.failureThreshold` | Number of failed probes before considering the sniffer unhealthy | `3` |
| `tap.serviceMesh` | Capture traffic from service meshes like Istio, Linkerd, Consul, etc. | `true` |
@@ -213,15 +229,17 @@ Example for overriding image names:
| `tap.telemetry.enabled` | Enable anonymous usage statistics collection | `true` |
| `tap.resourceGuard.enabled` | Enable resource guard worker process, which watches RAM/disk usage and enables/disables traffic capture based on available resources | `false` |
| `tap.secrets` | List of secrets to be used as source for environment variables (e.g. `kubeshark-license`) | `[]` |
| `tap.sentry.enabled` | Enable sending of error logs to Sentry | `true` (only for qualified users) |
| `tap.sentry.enabled` | Enable sending of error logs to Sentry | `false` |
| `tap.sentry.environment` | Sentry environment to label error logs with | `production` |
| `tap.defaultFilter` | Sets the default dashboard KFL filter (e.g. `http`). By default, this value is set to filter out noisy protocols such as DNS, UDP, ICMP and TCP. The user can easily change this, **temporarily**, in the Dashboard. For a permanent change, you should change this value in the `values.yaml` or `config.yaml` file. | `""` |
| `tap.liveConfigMapChangesDisabled` | If set to `true`, all user functionality (scripting, targeting settings, global & default KFL modification, traffic recording, traffic capturing on/off, protocol dissectors) involving dynamic ConfigMap changes from UI will be disabled | `false` |
| `tap.globalFilter` | Prepends to any KFL filter and can be used to limit what is visible in the dashboard. For example, `redact("request.headers.Authorization")` will redact the appropriate field. Another example `!dns` will not show any DNS traffic. | `""` |
| `tap.metrics.port` | Pod port used to expose Prometheus metrics | `49100` |
| `tap.enabledDissectors` | This is an array of strings representing the list of supported protocols. Remove or comment out redundant protocols (e.g., dns).| The default list excludes: `udp` and `tcp` |
| `tap.mountBpf` | BPF filesystem needs to be mounted for eBPF to work properly. This helm value determines whether Kubeshark will attempt to mount the filesystem. This option is not required if filesystem is already mounts. │ `true`|
| `tap.hostNetwork` | Enable host network mode for worker DaemonSet pods. When enabled, worker pods use the host's network namespace for direct network access. | `true` |
| `tap.packetCapture` | Packet capture backend: `best`, `af_packet`, or `pf_ring` | `best` |
| `tap.misc.trafficSampleRate` | Percentage of traffic to process (0-100) | `100` |
| `tap.misc.tcpStreamChannelTimeoutMs` | Timeout in milliseconds for TCP stream channel | `10000` |
| `tap.gitops.enabled` | Enable GitOps functionality. This will allow you to use GitOps to manage your Kubeshark configuration. | `false` |
| `tap.misc.tcpFlowTimeout` | TCP flow aggregation timeout in seconds. Controls how long the worker waits before finalizing a TCP flow. | `1200` |
| `tap.misc.udpFlowTimeout` | UDP flow aggregation timeout in seconds. Controls how long the worker waits before finalizing a UDP flow. | `1200` |
@@ -242,10 +260,6 @@ Example for overriding image names:
| `supportChatEnabled` | Enable real-time support chat channel based on Intercom | `false` |
| `internetConnectivity` | Turns off API requests that are dependent on Internet connectivity such as `telemetry` and `online-support`. | `true` |
KernelMapping pairs kernel versions with a
DriverContainer image. Kernel versions can be matched
literally or using a regular expression
# Installing with SAML enabled
### Prerequisites:

View File

@@ -2,7 +2,7 @@
Kubeshark can upload and download snapshots to cloud object storage, enabling cross-cluster sharing, backup/restore, and long-term retention.
Supported providers: **Amazon S3** (`s3`) and **Azure Blob Storage** (`azblob`).
Supported providers: **Amazon S3** (`s3`), **Azure Blob Storage** (`azblob`), and **Google Cloud Storage** (`gcs`).
## Helm Values
@@ -10,14 +10,36 @@ Supported providers: **Amazon S3** (`s3`) and **Azure Blob Storage** (`azblob`).
tap:
snapshots:
cloud:
provider: "" # "s3" or "azblob" (empty = disabled)
provider: "" # "s3", "azblob", or "gcs" (empty = disabled)
prefix: "" # key prefix in the bucket/container (e.g. "snapshots/")
configMaps: [] # names of pre-existing ConfigMaps with cloud config env vars
secrets: [] # names of pre-existing Secrets with cloud credentials
s3:
bucket: ""
region: ""
accessKey: ""
secretKey: ""
roleArn: ""
externalId: ""
azblob:
storageAccount: ""
container: ""
storageKey: ""
gcs:
bucket: ""
project: ""
credentialsJson: ""
```
- `provider` selects which cloud backend to use. Leave empty to disable cloud storage.
- `configMaps` and `secrets` are lists of names of existing ConfigMap/Secret resources. They are mounted as `envFrom` on the hub pod, injecting all their keys as environment variables.
### Inline Values (Alternative to External ConfigMaps/Secrets)
Instead of creating ConfigMap and Secret resources manually, you can set cloud storage configuration directly in `values.yaml` or via `--set` flags. The Helm chart will automatically create the necessary ConfigMap and Secret resources.
Both approaches can be used together — inline values are additive to external `configMaps`/`secrets` references.
---
## Amazon S3
@@ -48,9 +70,110 @@ Credentials are resolved in this order:
The provider validates bucket access on startup via `HeadBucket`. If the bucket is inaccessible, the hub will fail to start.
### Example: Inline Values (simplest approach)
```yaml
tap:
snapshots:
cloud:
provider: "s3"
s3:
bucket: my-kubeshark-snapshots
region: us-east-1
```
Or with static credentials via `--set`:
```bash
helm install kubeshark kubeshark/kubeshark \
--set tap.snapshots.cloud.provider=s3 \
--set tap.snapshots.cloud.s3.bucket=my-kubeshark-snapshots \
--set tap.snapshots.cloud.s3.region=us-east-1 \
--set tap.snapshots.cloud.s3.accessKey=AKIA... \
--set tap.snapshots.cloud.s3.secretKey=wJal...
```
### Example: IRSA (recommended for EKS)
Create a ConfigMap with bucket configuration:
[IAM Roles for Service Accounts (IRSA)](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) lets EKS pods assume an IAM role without static credentials. EKS injects a short-lived token into the pod automatically.
**Prerequisites:**
1. Your EKS cluster must have an [OIDC provider](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html) associated with it.
2. An IAM role with a trust policy that allows the Kubeshark service account to assume it.
**Step 1 — Create an IAM policy scoped to your bucket:**
```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:GetObjectVersion",
"s3:DeleteObjectVersion",
"s3:ListBucket",
"s3:ListBucketVersions",
"s3:GetBucketLocation",
"s3:GetBucketVersioning"
],
"Resource": [
"arn:aws:s3:::my-kubeshark-snapshots",
"arn:aws:s3:::my-kubeshark-snapshots/*"
]
}
]
}
```
> For read-only access, remove `s3:PutObject`, `s3:DeleteObject`, and `s3:DeleteObjectVersion`.
**Step 2 — Create an IAM role with IRSA trust policy:**
```bash
# Get your cluster's OIDC provider URL
OIDC_PROVIDER=$(aws eks describe-cluster --name CLUSTER_NAME \
--query "cluster.identity.oidc.issuer" --output text | sed 's|https://||')
# Create a trust policy
# The default K8s SA name is "<release-name>-service-account" (e.g. "kubeshark-service-account")
cat > trust-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/${OIDC_PROVIDER}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${OIDC_PROVIDER}:sub": "system:serviceaccount:NAMESPACE:kubeshark-service-account",
"${OIDC_PROVIDER}:aud": "sts.amazonaws.com"
}
}
}
]
}
EOF
# Create the role and attach your policy
aws iam create-role \
--role-name KubesharkS3Role \
--assume-role-policy-document file://trust-policy.json
aws iam put-role-policy \
--role-name KubesharkS3Role \
--policy-name KubesharkSnapshotsBucketAccess \
--policy-document file://bucket-policy.json
```
**Step 3 — Create a ConfigMap with bucket configuration:**
```yaml
apiVersion: v1
@@ -62,10 +185,12 @@ data:
SNAPSHOT_AWS_REGION: us-east-1
```
Set Helm values:
**Step 4 — Set Helm values with `tap.annotations` to annotate the service account:**
```yaml
tap:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/KubesharkS3Role
snapshots:
cloud:
provider: "s3"
@@ -73,7 +198,17 @@ tap:
- kubeshark-s3-config
```
The hub pod's service account must be annotated for IRSA with an IAM role that has S3 access to the bucket.
Or via `--set`:
```bash
helm install kubeshark kubeshark/kubeshark \
--set tap.snapshots.cloud.provider=s3 \
--set tap.snapshots.cloud.s3.bucket=my-kubeshark-snapshots \
--set tap.snapshots.cloud.s3.region=us-east-1 \
--set tap.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::ACCOUNT_ID:role/KubesharkS3Role
```
No `accessKey`/`secretKey` is needed — EKS injects credentials automatically via the IRSA token.
### Example: Static Credentials
@@ -159,6 +294,19 @@ Credentials are resolved in this order:
The provider validates container access on startup via `GetProperties`. If the container is inaccessible, the hub will fail to start.
### Example: Inline Values
```yaml
tap:
snapshots:
cloud:
provider: "azblob"
azblob:
storageAccount: mykubesharksa
container: snapshots
storageKey: "base64-encoded-storage-key..." # optional, omit for DefaultAzureCredential
```
### Example: Workload Identity (recommended for AKS)
Create a ConfigMap with storage configuration:
@@ -224,3 +372,212 @@ tap:
secrets:
- kubeshark-azblob-creds
```
---
## Google Cloud Storage
### Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| `SNAPSHOT_GCS_BUCKET` | Yes | GCS bucket name |
| `SNAPSHOT_GCS_PROJECT` | No | GCP project ID |
| `SNAPSHOT_GCS_CREDENTIALS_JSON` | No | Service account JSON key (empty = use Application Default Credentials) |
| `SNAPSHOT_CLOUD_PREFIX` | No | Key prefix in the bucket (e.g. `snapshots/`) |
### Authentication Methods
Credentials are resolved in this order:
1. **Service Account JSON Key** -- If `SNAPSHOT_GCS_CREDENTIALS_JSON` is set, the provided JSON key is used directly.
2. **Application Default Credentials** -- When no JSON key is provided, the GCP SDK default credential chain is used:
- **Workload Identity** (GKE pod identity) -- recommended for production on GKE
- GCE instance metadata (Compute Engine default service account)
- Standard GCP environment variables (`GOOGLE_APPLICATION_CREDENTIALS`)
- `gcloud` CLI credentials
The provider validates bucket access on startup via `Bucket.Attrs()`. If the bucket is inaccessible, the hub will fail to start.
### Required IAM Permissions
The service account needs different IAM roles depending on the access level:
**Read-only** (download, list, and sync snapshots from cloud):
| Role | Permissions provided | Purpose |
|------|---------------------|---------|
| `roles/storage.legacyBucketReader` | `storage.buckets.get`, `storage.objects.list` | Hub startup (bucket validation) + listing snapshots |
| `roles/storage.objectViewer` | `storage.objects.get`, `storage.objects.list` | Downloading snapshots, checking existence, reading metadata |
```bash
gcloud storage buckets add-iam-policy-binding gs://BUCKET_NAME \
--member="serviceAccount:SA_EMAIL" \
--role="roles/storage.legacyBucketReader"
gcloud storage buckets add-iam-policy-binding gs://BUCKET_NAME \
--member="serviceAccount:SA_EMAIL" \
--role="roles/storage.objectViewer"
```
**Read-write** (upload and delete snapshots in addition to read):
Add `roles/storage.objectAdmin` instead of `roles/storage.objectViewer` to also grant `storage.objects.create` and `storage.objects.delete`:
| Role | Permissions provided | Purpose |
|------|---------------------|---------|
| `roles/storage.legacyBucketReader` | `storage.buckets.get`, `storage.objects.list` | Hub startup (bucket validation) + listing snapshots |
| `roles/storage.objectAdmin` | `storage.objects.*` | Full object CRUD (upload, download, delete, list, metadata) |
```bash
gcloud storage buckets add-iam-policy-binding gs://BUCKET_NAME \
--member="serviceAccount:SA_EMAIL" \
--role="roles/storage.legacyBucketReader"
gcloud storage buckets add-iam-policy-binding gs://BUCKET_NAME \
--member="serviceAccount:SA_EMAIL" \
--role="roles/storage.objectAdmin"
```
### Example: Inline Values (simplest approach)
```yaml
tap:
snapshots:
cloud:
provider: "gcs"
gcs:
bucket: my-kubeshark-snapshots
project: my-gcp-project
```
Or with a service account key via `--set`:
```bash
helm install kubeshark kubeshark/kubeshark \
--set tap.snapshots.cloud.provider=gcs \
--set tap.snapshots.cloud.gcs.bucket=my-kubeshark-snapshots \
--set tap.snapshots.cloud.gcs.project=my-gcp-project \
--set-file tap.snapshots.cloud.gcs.credentialsJson=service-account.json
```
### Example: Workload Identity (recommended for GKE)
Create a ConfigMap with bucket configuration:
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: kubeshark-gcs-config
data:
SNAPSHOT_GCS_BUCKET: my-kubeshark-snapshots
SNAPSHOT_GCS_PROJECT: my-gcp-project
```
Set Helm values:
```yaml
tap:
snapshots:
cloud:
provider: "gcs"
configMaps:
- kubeshark-gcs-config
```
Configure GKE Workload Identity to allow the Kubernetes service account to impersonate the GCP service account:
```bash
# Ensure the GKE cluster has Workload Identity enabled
# (--workload-pool=PROJECT_ID.svc.id.goog at cluster creation)
# Create a GCP service account (if not already created)
gcloud iam service-accounts create kubeshark-gcs \
--display-name="Kubeshark GCS Snapshots"
# Grant bucket access (read-write — see Required IAM Permissions above)
gcloud storage buckets add-iam-policy-binding gs://BUCKET_NAME \
--member="serviceAccount:kubeshark-gcs@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/storage.legacyBucketReader"
gcloud storage buckets add-iam-policy-binding gs://BUCKET_NAME \
--member="serviceAccount:kubeshark-gcs@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/storage.objectAdmin"
# Allow the K8s service account to impersonate the GCP service account
# Note: the K8s SA name is "<release-name>-service-account" (default: "kubeshark-service-account")
gcloud iam service-accounts add-iam-policy-binding \
kubeshark-gcs@PROJECT_ID.iam.gserviceaccount.com \
--role="roles/iam.workloadIdentityUser" \
--member="serviceAccount:PROJECT_ID.svc.id.goog[NAMESPACE/kubeshark-service-account]"
```
Set Helm values — the `tap.annotations` field adds the Workload Identity annotation to the service account:
```yaml
tap:
annotations:
iam.gke.io/gcp-service-account: kubeshark-gcs@PROJECT_ID.iam.gserviceaccount.com
snapshots:
cloud:
provider: "gcs"
configMaps:
- kubeshark-gcs-config
```
Or via `--set`:
```bash
helm install kubeshark kubeshark/kubeshark \
--set tap.snapshots.cloud.provider=gcs \
--set tap.snapshots.cloud.gcs.bucket=BUCKET_NAME \
--set tap.snapshots.cloud.gcs.project=PROJECT_ID \
--set tap.annotations."iam\.gke\.io/gcp-service-account"=kubeshark-gcs@PROJECT_ID.iam.gserviceaccount.com
```
No `credentialsJson` secret is needed — GKE injects credentials automatically via the Workload Identity metadata server.
### Example: Service Account Key
Create a Secret with the service account JSON key:
```yaml
apiVersion: v1
kind: Secret
metadata:
name: kubeshark-gcs-creds
type: Opaque
stringData:
SNAPSHOT_GCS_CREDENTIALS_JSON: |
{
"type": "service_account",
"project_id": "my-gcp-project",
"private_key_id": "...",
"private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"client_email": "kubeshark@my-gcp-project.iam.gserviceaccount.com",
...
}
```
Create a ConfigMap with bucket configuration:
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: kubeshark-gcs-config
data:
SNAPSHOT_GCS_BUCKET: my-kubeshark-snapshots
SNAPSHOT_GCS_PROJECT: my-gcp-project
```
Set Helm values:
```yaml
tap:
snapshots:
cloud:
provider: "gcs"
configMaps:
- kubeshark-gcs-config
secrets:
- kubeshark-gcs-creds
```

View File

@@ -39,7 +39,7 @@ spec:
- -capture-stop-after
- "{{ if hasKey .Values.tap.capture.dissection "stopAfter" }}{{ .Values.tap.capture.dissection.stopAfter }}{{ else }}5m{{ end }}"
- -snapshot-size-limit
- '{{ .Values.tap.snapshots.storageSize }}'
- '{{ .Values.tap.snapshots.local.storageSize }}'
- -dissector-image
{{- if .Values.tap.docker.overrideImage.worker }}
- '{{ .Values.tap.docker.overrideImage.worker }}'
@@ -65,7 +65,9 @@ spec:
- -cloud-storage-provider
- '{{ .Values.tap.snapshots.cloud.provider }}'
{{- end }}
{{- if or .Values.tap.secrets .Values.tap.snapshots.cloud.configMaps .Values.tap.snapshots.cloud.secrets }}
{{- $hasInlineConfig := or .Values.tap.snapshots.cloud.prefix .Values.tap.snapshots.cloud.s3.bucket .Values.tap.snapshots.cloud.s3.region .Values.tap.snapshots.cloud.s3.roleArn .Values.tap.snapshots.cloud.s3.externalId .Values.tap.snapshots.cloud.azblob.storageAccount .Values.tap.snapshots.cloud.azblob.container .Values.tap.snapshots.cloud.gcs.bucket .Values.tap.snapshots.cloud.gcs.project }}
{{- $hasInlineSecrets := or .Values.tap.snapshots.cloud.s3.accessKey .Values.tap.snapshots.cloud.s3.secretKey .Values.tap.snapshots.cloud.azblob.storageKey .Values.tap.snapshots.cloud.gcs.credentialsJson }}
{{- if or .Values.tap.secrets .Values.tap.snapshots.cloud.configMaps .Values.tap.snapshots.cloud.secrets $hasInlineConfig $hasInlineSecrets }}
envFrom:
{{- range .Values.tap.secrets }}
- secretRef:
@@ -79,6 +81,14 @@ spec:
- secretRef:
name: {{ . }}
{{- end }}
{{- if $hasInlineConfig }}
- configMapRef:
name: {{ include "kubeshark.name" . }}-cloud-config
{{- end }}
{{- if $hasInlineSecrets }}
- secretRef:
name: {{ include "kubeshark.name" . }}-cloud-secret
{{- end }}
{{- end }}
env:
- name: POD_NAME

View File

@@ -26,15 +26,15 @@ spec:
- env:
- name: REACT_APP_AUTH_ENABLED
value: '{{- if or (and .Values.cloudLicenseEnabled (not (empty .Values.license))) (not .Values.internetConnectivity) -}}
{{ (and .Values.tap.auth.enabled (eq .Values.tap.auth.type "dex")) | ternary true false }}
{{ (default false .Values.demoModeEnabled) | ternary true ((and .Values.tap.auth.enabled (eq .Values.tap.auth.type "dex")) | ternary true false) }}
{{- else -}}
{{ .Values.cloudLicenseEnabled | ternary "true" .Values.tap.auth.enabled }}
{{ .Values.cloudLicenseEnabled | ternary "true" ((default false .Values.demoModeEnabled) | ternary "true" .Values.tap.auth.enabled) }}
{{- end }}'
- name: REACT_APP_AUTH_TYPE
value: '{{- if and .Values.cloudLicenseEnabled (not (eq .Values.tap.auth.type "dex")) -}}
default
{{- else -}}
{{ .Values.tap.auth.type }}
{{ (default false .Values.demoModeEnabled) | ternary "default" .Values.tap.auth.type }}
{{- end }}'
- name: REACT_APP_COMPLETE_STREAMING_ENABLED
value: '{{- if and (hasKey .Values.tap "dashboard") (hasKey .Values.tap.dashboard "completeStreamingEnabled") -}}
@@ -55,30 +55,22 @@ spec:
false
{{- end }}'
- name: REACT_APP_SCRIPTING_DISABLED
value: '{{- if .Values.tap.liveConfigMapChangesDisabled -}}
{{- if .Values.demoModeEnabled -}}
{{ .Values.demoModeEnabled | ternary false true }}
{{- else -}}
true
{{- end }}
{{- else -}}
false
{{- end }}'
value: '{{ default false .Values.demoModeEnabled }}'
- name: REACT_APP_TARGETED_PODS_UPDATE_DISABLED
value: '{{ .Values.tap.liveConfigMapChangesDisabled }}'
value: '{{ default false .Values.demoModeEnabled }}'
- name: REACT_APP_PRESET_FILTERS_CHANGING_ENABLED
value: '{{ .Values.tap.liveConfigMapChangesDisabled | ternary "false" "true" }}'
value: '{{ not (default false .Values.demoModeEnabled) }}'
- name: REACT_APP_BPF_OVERRIDE_DISABLED
value: '{{ eq .Values.tap.packetCapture "af_packet" | ternary "false" "true" }}'
- name: REACT_APP_RECORDING_DISABLED
value: '{{ .Values.tap.liveConfigMapChangesDisabled }}'
value: '{{ default false .Values.demoModeEnabled }}'
- name: REACT_APP_DISSECTION_ENABLED
value: '{{ .Values.tap.capture.dissection.enabled | ternary "true" "false" }}'
- name: REACT_APP_DISSECTION_CONTROL_ENABLED
value: '{{- if and .Values.tap.liveConfigMapChangesDisabled (not .Values.tap.capture.dissection.enabled) -}}
value: '{{- if and (not .Values.demoModeEnabled) (not .Values.tap.capture.dissection.enabled) -}}
true
{{- else -}}
{{ not .Values.tap.liveConfigMapChangesDisabled | ternary "true" "false" }}
{{ not (default false .Values.demoModeEnabled) | ternary false true }}
{{- end -}}'
- name: 'REACT_APP_CLOUD_LICENSE_ENABLED'
value: '{{- if or (and .Values.cloudLicenseEnabled (not (empty .Values.license))) (not .Values.internetConnectivity) -}}
@@ -91,7 +83,13 @@ spec:
- name: REACT_APP_BETA_ENABLED
value: '{{ default false .Values.betaEnabled | ternary "true" "false" }}'
- name: REACT_APP_DISSECTORS_UPDATING_ENABLED
value: '{{ .Values.tap.liveConfigMapChangesDisabled | ternary "false" "true" }}'
value: '{{ not (default false .Values.demoModeEnabled) }}'
- name: REACT_APP_SNAPSHOTS_UPDATING_ENABLED
value: '{{ not (default false .Values.demoModeEnabled) }}'
- name: REACT_APP_DEMO_MODE_ENABLED
value: '{{ default false .Values.demoModeEnabled }}'
- name: REACT_APP_CLUSTER_WIDE_MAP_ENABLED
value: '{{ default false (((.Values).tap).dashboard).clusterWideMapEnabled }}'
- name: REACT_APP_RAW_CAPTURE_ENABLED
value: '{{ .Values.tap.capture.raw.enabled | ternary "true" "false" }}'
- name: REACT_APP_SENTRY_ENABLED

View File

@@ -19,14 +19,14 @@ data:
INGRESS_HOST: '{{ .Values.tap.ingress.host }}'
PROXY_FRONT_PORT: '{{ .Values.tap.proxy.front.port }}'
AUTH_ENABLED: '{{- if and .Values.cloudLicenseEnabled (not (empty .Values.license)) -}}
{{ and .Values.tap.auth.enabled (eq .Values.tap.auth.type "dex") | ternary true false }}
{{ (default false .Values.demoModeEnabled) | ternary true ((and .Values.tap.auth.enabled (eq .Values.tap.auth.type "dex")) | ternary true false) }}
{{- else -}}
{{ .Values.cloudLicenseEnabled | ternary "true" (.Values.tap.auth.enabled | ternary "true" "") }}
{{ .Values.cloudLicenseEnabled | ternary "true" ((default false .Values.demoModeEnabled) | ternary "true" .Values.tap.auth.enabled) }}
{{- end }}'
AUTH_TYPE: '{{- if and .Values.cloudLicenseEnabled (not (eq .Values.tap.auth.type "dex")) -}}
default
{{- else -}}
{{ .Values.tap.auth.type }}
{{ (default false .Values.demoModeEnabled) | ternary "default" .Values.tap.auth.type }}
{{- end }}'
AUTH_SAML_IDP_METADATA_URL: '{{ .Values.tap.auth.saml.idpMetadataUrl }}'
AUTH_SAML_ROLE_ATTRIBUTE: '{{ .Values.tap.auth.saml.roleAttribute }}'
@@ -44,22 +44,14 @@ data:
false
{{- end }}'
TELEMETRY_DISABLED: '{{ not .Values.internetConnectivity | ternary "true" (not .Values.tap.telemetry.enabled | ternary "true" "false") }}'
SCRIPTING_DISABLED: '{{- if .Values.tap.liveConfigMapChangesDisabled -}}
{{- if .Values.demoModeEnabled -}}
{{ .Values.demoModeEnabled | ternary false true }}
{{- else -}}
true
{{- end }}
{{- else -}}
false
{{- end }}'
TARGETED_PODS_UPDATE_DISABLED: '{{ .Values.tap.liveConfigMapChangesDisabled | ternary "true" "" }}'
PRESET_FILTERS_CHANGING_ENABLED: '{{ .Values.tap.liveConfigMapChangesDisabled | ternary "false" "true" }}'
RECORDING_DISABLED: '{{ .Values.tap.liveConfigMapChangesDisabled | ternary "true" "" }}'
DISSECTION_CONTROL_ENABLED: '{{- if and .Values.tap.liveConfigMapChangesDisabled (not .Values.tap.capture.dissection.enabled) -}}
SCRIPTING_DISABLED: '{{ default false .Values.demoModeEnabled }}'
TARGETED_PODS_UPDATE_DISABLED: '{{ default false .Values.demoModeEnabled }}'
PRESET_FILTERS_CHANGING_ENABLED: '{{ not (default false .Values.demoModeEnabled) }}'
RECORDING_DISABLED: '{{ (default false .Values.demoModeEnabled) | ternary true false }}'
DISSECTION_CONTROL_ENABLED: '{{- if and (not .Values.demoModeEnabled) (not .Values.tap.capture.dissection.enabled) -}}
true
{{- else -}}
{{ not .Values.tap.liveConfigMapChangesDisabled | ternary "true" "false" }}
{{ (default false .Values.demoModeEnabled) | ternary false true }}
{{- end }}'
GLOBAL_FILTER: {{ include "kubeshark.escapeDoubleQuotes" .Values.tap.globalFilter | quote }}
DEFAULT_FILTER: {{ include "kubeshark.escapeDoubleQuotes" .Values.tap.defaultFilter | quote }}
@@ -76,7 +68,9 @@ data:
DUPLICATE_TIMEFRAME: '{{ .Values.tap.misc.duplicateTimeframe }}'
ENABLED_DISSECTORS: '{{ gt (len .Values.tap.enabledDissectors) 0 | ternary (join "," .Values.tap.enabledDissectors) "" }}'
CUSTOM_MACROS: '{{ toJson .Values.tap.customMacros }}'
DISSECTORS_UPDATING_ENABLED: '{{ .Values.tap.liveConfigMapChangesDisabled | ternary "false" "true" }}'
DISSECTORS_UPDATING_ENABLED: '{{ not (default false .Values.demoModeEnabled) }}'
SNAPSHOTS_UPDATING_ENABLED: '{{ not (default false .Values.demoModeEnabled) }}'
DEMO_MODE_ENABLED: '{{ default false .Values.demoModeEnabled }}'
DETECT_DUPLICATES: '{{ .Values.tap.misc.detectDuplicates | ternary "true" "false" }}'
PCAP_DUMP_ENABLE: '{{ .Values.pcapdump.enabled }}'
PCAP_TIME_INTERVAL: '{{ .Values.pcapdump.timeInterval }}'

View File

@@ -0,0 +1,64 @@
{{- $hasConfigValues := or .Values.tap.snapshots.cloud.prefix .Values.tap.snapshots.cloud.s3.bucket .Values.tap.snapshots.cloud.s3.region .Values.tap.snapshots.cloud.s3.roleArn .Values.tap.snapshots.cloud.s3.externalId .Values.tap.snapshots.cloud.azblob.storageAccount .Values.tap.snapshots.cloud.azblob.container .Values.tap.snapshots.cloud.gcs.bucket .Values.tap.snapshots.cloud.gcs.project -}}
{{- $hasSecretValues := or .Values.tap.snapshots.cloud.s3.accessKey .Values.tap.snapshots.cloud.s3.secretKey .Values.tap.snapshots.cloud.azblob.storageKey .Values.tap.snapshots.cloud.gcs.credentialsJson -}}
{{- if $hasConfigValues }}
---
apiVersion: v1
kind: ConfigMap
metadata:
labels:
{{- include "kubeshark.labels" . | nindent 4 }}
name: {{ include "kubeshark.name" . }}-cloud-config
namespace: {{ .Release.Namespace }}
data:
{{- if .Values.tap.snapshots.cloud.prefix }}
SNAPSHOT_CLOUD_PREFIX: {{ .Values.tap.snapshots.cloud.prefix | quote }}
{{- end }}
{{- if .Values.tap.snapshots.cloud.s3.bucket }}
SNAPSHOT_AWS_BUCKET: {{ .Values.tap.snapshots.cloud.s3.bucket | quote }}
{{- end }}
{{- if .Values.tap.snapshots.cloud.s3.region }}
SNAPSHOT_AWS_REGION: {{ .Values.tap.snapshots.cloud.s3.region | quote }}
{{- end }}
{{- if .Values.tap.snapshots.cloud.s3.roleArn }}
SNAPSHOT_AWS_ROLE_ARN: {{ .Values.tap.snapshots.cloud.s3.roleArn | quote }}
{{- end }}
{{- if .Values.tap.snapshots.cloud.s3.externalId }}
SNAPSHOT_AWS_EXTERNAL_ID: {{ .Values.tap.snapshots.cloud.s3.externalId | quote }}
{{- end }}
{{- if .Values.tap.snapshots.cloud.azblob.storageAccount }}
SNAPSHOT_AZBLOB_STORAGE_ACCOUNT: {{ .Values.tap.snapshots.cloud.azblob.storageAccount | quote }}
{{- end }}
{{- if .Values.tap.snapshots.cloud.azblob.container }}
SNAPSHOT_AZBLOB_CONTAINER: {{ .Values.tap.snapshots.cloud.azblob.container | quote }}
{{- end }}
{{- if .Values.tap.snapshots.cloud.gcs.bucket }}
SNAPSHOT_GCS_BUCKET: {{ .Values.tap.snapshots.cloud.gcs.bucket | quote }}
{{- end }}
{{- if .Values.tap.snapshots.cloud.gcs.project }}
SNAPSHOT_GCS_PROJECT: {{ .Values.tap.snapshots.cloud.gcs.project | quote }}
{{- end }}
{{- end }}
{{- if $hasSecretValues }}
---
apiVersion: v1
kind: Secret
metadata:
labels:
{{- include "kubeshark.labels" . | nindent 4 }}
name: {{ include "kubeshark.name" . }}-cloud-secret
namespace: {{ .Release.Namespace }}
type: Opaque
stringData:
{{- if .Values.tap.snapshots.cloud.s3.accessKey }}
SNAPSHOT_AWS_ACCESS_KEY: {{ .Values.tap.snapshots.cloud.s3.accessKey | quote }}
{{- end }}
{{- if .Values.tap.snapshots.cloud.s3.secretKey }}
SNAPSHOT_AWS_SECRET_KEY: {{ .Values.tap.snapshots.cloud.s3.secretKey | quote }}
{{- end }}
{{- if .Values.tap.snapshots.cloud.azblob.storageKey }}
SNAPSHOT_AZBLOB_STORAGE_KEY: {{ .Values.tap.snapshots.cloud.azblob.storageKey | quote }}
{{- end }}
{{- if .Values.tap.snapshots.cloud.gcs.credentialsJson }}
SNAPSHOT_GCS_CREDENTIALS_JSON: {{ .Values.tap.snapshots.cloud.gcs.credentialsJson | quote }}
{{- end }}
{{- end }}

View File

@@ -0,0 +1,248 @@
suite: cloud storage template
templates:
- templates/21-cloud-storage.yaml
tests:
- it: should render nothing with default values
asserts:
- hasDocuments:
count: 0
- it: should render ConfigMap with S3 config only
set:
tap.snapshots.cloud.s3.bucket: my-bucket
tap.snapshots.cloud.s3.region: us-east-1
asserts:
- hasDocuments:
count: 1
- isKind:
of: ConfigMap
documentIndex: 0
- equal:
path: metadata.name
value: RELEASE-NAME-cloud-config
documentIndex: 0
- equal:
path: data.SNAPSHOT_AWS_BUCKET
value: "my-bucket"
documentIndex: 0
- equal:
path: data.SNAPSHOT_AWS_REGION
value: "us-east-1"
documentIndex: 0
- notExists:
path: data.SNAPSHOT_AWS_ACCESS_KEY
documentIndex: 0
- it: should render ConfigMap and Secret with S3 config and credentials
set:
tap.snapshots.cloud.s3.bucket: my-bucket
tap.snapshots.cloud.s3.region: us-east-1
tap.snapshots.cloud.s3.accessKey: AKIAIOSFODNN7EXAMPLE
tap.snapshots.cloud.s3.secretKey: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
asserts:
- hasDocuments:
count: 2
- isKind:
of: ConfigMap
documentIndex: 0
- equal:
path: data.SNAPSHOT_AWS_BUCKET
value: "my-bucket"
documentIndex: 0
- equal:
path: data.SNAPSHOT_AWS_REGION
value: "us-east-1"
documentIndex: 0
- isKind:
of: Secret
documentIndex: 1
- equal:
path: metadata.name
value: RELEASE-NAME-cloud-secret
documentIndex: 1
- equal:
path: stringData.SNAPSHOT_AWS_ACCESS_KEY
value: "AKIAIOSFODNN7EXAMPLE"
documentIndex: 1
- equal:
path: stringData.SNAPSHOT_AWS_SECRET_KEY
value: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
documentIndex: 1
- it: should render ConfigMap with Azure Blob config only
set:
tap.snapshots.cloud.azblob.storageAccount: myaccount
tap.snapshots.cloud.azblob.container: mycontainer
asserts:
- hasDocuments:
count: 1
- isKind:
of: ConfigMap
documentIndex: 0
- equal:
path: data.SNAPSHOT_AZBLOB_STORAGE_ACCOUNT
value: "myaccount"
documentIndex: 0
- equal:
path: data.SNAPSHOT_AZBLOB_CONTAINER
value: "mycontainer"
documentIndex: 0
- it: should render ConfigMap and Secret with Azure Blob config and storage key
set:
tap.snapshots.cloud.azblob.storageAccount: myaccount
tap.snapshots.cloud.azblob.container: mycontainer
tap.snapshots.cloud.azblob.storageKey: c29tZWtleQ==
asserts:
- hasDocuments:
count: 2
- isKind:
of: ConfigMap
documentIndex: 0
- equal:
path: data.SNAPSHOT_AZBLOB_STORAGE_ACCOUNT
value: "myaccount"
documentIndex: 0
- isKind:
of: Secret
documentIndex: 1
- equal:
path: stringData.SNAPSHOT_AZBLOB_STORAGE_KEY
value: "c29tZWtleQ=="
documentIndex: 1
- it: should render ConfigMap with GCS config only
set:
tap.snapshots.cloud.gcs.bucket: my-gcs-bucket
tap.snapshots.cloud.gcs.project: my-gcp-project
asserts:
- hasDocuments:
count: 1
- isKind:
of: ConfigMap
documentIndex: 0
- equal:
path: data.SNAPSHOT_GCS_BUCKET
value: "my-gcs-bucket"
documentIndex: 0
- equal:
path: data.SNAPSHOT_GCS_PROJECT
value: "my-gcp-project"
documentIndex: 0
- notExists:
path: data.SNAPSHOT_GCS_CREDENTIALS_JSON
documentIndex: 0
- it: should render ConfigMap and Secret with GCS config and credentials
set:
tap.snapshots.cloud.gcs.bucket: my-gcs-bucket
tap.snapshots.cloud.gcs.project: my-gcp-project
tap.snapshots.cloud.gcs.credentialsJson: '{"type":"service_account"}'
asserts:
- hasDocuments:
count: 2
- isKind:
of: ConfigMap
documentIndex: 0
- equal:
path: data.SNAPSHOT_GCS_BUCKET
value: "my-gcs-bucket"
documentIndex: 0
- equal:
path: data.SNAPSHOT_GCS_PROJECT
value: "my-gcp-project"
documentIndex: 0
- isKind:
of: Secret
documentIndex: 1
- equal:
path: metadata.name
value: RELEASE-NAME-cloud-secret
documentIndex: 1
- equal:
path: stringData.SNAPSHOT_GCS_CREDENTIALS_JSON
value: '{"type":"service_account"}'
documentIndex: 1
- it: should render ConfigMap with GCS bucket only (no project)
set:
tap.snapshots.cloud.gcs.bucket: my-gcs-bucket
asserts:
- hasDocuments:
count: 1
- isKind:
of: ConfigMap
documentIndex: 0
- equal:
path: data.SNAPSHOT_GCS_BUCKET
value: "my-gcs-bucket"
documentIndex: 0
- notExists:
path: data.SNAPSHOT_GCS_PROJECT
documentIndex: 0
- it: should render ConfigMap with only prefix
set:
tap.snapshots.cloud.prefix: snapshots/prod
asserts:
- hasDocuments:
count: 1
- isKind:
of: ConfigMap
documentIndex: 0
- equal:
path: data.SNAPSHOT_CLOUD_PREFIX
value: "snapshots/prod"
documentIndex: 0
- notExists:
path: data.SNAPSHOT_AWS_BUCKET
documentIndex: 0
- notExists:
path: data.SNAPSHOT_AZBLOB_STORAGE_ACCOUNT
documentIndex: 0
- notExists:
path: data.SNAPSHOT_GCS_BUCKET
documentIndex: 0
- it: should render ConfigMap with role ARN without credentials (IAM auth)
set:
tap.snapshots.cloud.s3.bucket: my-bucket
tap.snapshots.cloud.s3.region: us-east-1
tap.snapshots.cloud.s3.roleArn: arn:aws:iam::123456789012:role/my-role
asserts:
- hasDocuments:
count: 1
- isKind:
of: ConfigMap
documentIndex: 0
- equal:
path: data.SNAPSHOT_AWS_ROLE_ARN
value: "arn:aws:iam::123456789012:role/my-role"
documentIndex: 0
- equal:
path: data.SNAPSHOT_AWS_BUCKET
value: "my-bucket"
documentIndex: 0
- it: should render ConfigMap with externalId
set:
tap.snapshots.cloud.s3.bucket: my-bucket
tap.snapshots.cloud.s3.externalId: ext-12345
asserts:
- hasDocuments:
count: 1
- equal:
path: data.SNAPSHOT_AWS_EXTERNAL_ID
value: "ext-12345"
documentIndex: 0
- it: should set correct namespace
release:
namespace: kubeshark-ns
set:
tap.snapshots.cloud.s3.bucket: my-bucket
asserts:
- equal:
path: metadata.namespace
value: kubeshark-ns
documentIndex: 0

View File

@@ -0,0 +1,9 @@
tap:
snapshots:
cloud:
provider: azblob
prefix: snapshots/
azblob:
storageAccount: kubesharkstore
container: snapshots
storageKey: c29tZWtleWhlcmU=

View File

@@ -0,0 +1,8 @@
tap:
snapshots:
cloud:
provider: s3
configMaps:
- my-cloud-config
secrets:
- my-cloud-secret

View File

@@ -0,0 +1,9 @@
tap:
snapshots:
cloud:
provider: gcs
prefix: snapshots/
gcs:
bucket: kubeshark-snapshots
project: my-gcp-project
credentialsJson: '{"type":"service_account","project_id":"my-gcp-project"}'

View File

@@ -0,0 +1,10 @@
tap:
snapshots:
cloud:
provider: s3
prefix: snapshots/
s3:
bucket: kubeshark-snapshots
region: us-east-1
accessKey: AKIAIOSFODNN7EXAMPLE
secretKey: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

View File

@@ -0,0 +1,167 @@
suite: hub deployment cloud integration
templates:
- templates/04-hub-deployment.yaml
tests:
- it: should not render envFrom with default values
asserts:
- isKind:
of: Deployment
- notContains:
path: spec.template.spec.containers[0].envFrom
any: true
content:
configMapRef:
name: RELEASE-NAME-cloud-config
- it: should render envFrom with inline S3 config
set:
tap.snapshots.cloud.s3.bucket: my-bucket
tap.snapshots.cloud.s3.region: us-east-1
asserts:
- contains:
path: spec.template.spec.containers[0].envFrom
content:
configMapRef:
name: RELEASE-NAME-cloud-config
- it: should render envFrom secret ref with inline credentials
set:
tap.snapshots.cloud.s3.bucket: my-bucket
tap.snapshots.cloud.s3.accessKey: AKIAIOSFODNN7EXAMPLE
tap.snapshots.cloud.s3.secretKey: secret
asserts:
- contains:
path: spec.template.spec.containers[0].envFrom
content:
configMapRef:
name: RELEASE-NAME-cloud-config
- contains:
path: spec.template.spec.containers[0].envFrom
content:
secretRef:
name: RELEASE-NAME-cloud-secret
- it: should render envFrom with inline GCS config
set:
tap.snapshots.cloud.gcs.bucket: my-gcs-bucket
tap.snapshots.cloud.gcs.project: my-gcp-project
asserts:
- contains:
path: spec.template.spec.containers[0].envFrom
content:
configMapRef:
name: RELEASE-NAME-cloud-config
- it: should render envFrom secret ref with inline GCS credentials
set:
tap.snapshots.cloud.gcs.bucket: my-gcs-bucket
tap.snapshots.cloud.gcs.credentialsJson: '{"type":"service_account"}'
asserts:
- contains:
path: spec.template.spec.containers[0].envFrom
content:
configMapRef:
name: RELEASE-NAME-cloud-config
- contains:
path: spec.template.spec.containers[0].envFrom
content:
secretRef:
name: RELEASE-NAME-cloud-secret
- it: should render cloud-storage-provider arg when provider is gcs
set:
tap.snapshots.cloud.provider: gcs
asserts:
- contains:
path: spec.template.spec.containers[0].command
content: -cloud-storage-provider
- contains:
path: spec.template.spec.containers[0].command
content: gcs
- it: should render envFrom with external configMaps
set:
tap.snapshots.cloud.configMaps:
- my-cloud-config
- my-other-config
asserts:
- contains:
path: spec.template.spec.containers[0].envFrom
content:
configMapRef:
name: my-cloud-config
- contains:
path: spec.template.spec.containers[0].envFrom
content:
configMapRef:
name: my-other-config
- it: should render envFrom with external secrets
set:
tap.snapshots.cloud.secrets:
- my-cloud-secret
asserts:
- contains:
path: spec.template.spec.containers[0].envFrom
content:
secretRef:
name: my-cloud-secret
- it: should render cloud-storage-provider arg when provider is set
set:
tap.snapshots.cloud.provider: s3
asserts:
- contains:
path: spec.template.spec.containers[0].command
content: -cloud-storage-provider
- contains:
path: spec.template.spec.containers[0].command
content: s3
- it: should not render cloud-storage-provider arg with default values
asserts:
- notContains:
path: spec.template.spec.containers[0].command
content: -cloud-storage-provider
- it: should render envFrom with tap.secrets
set:
tap.secrets:
- my-existing-secret
asserts:
- contains:
path: spec.template.spec.containers[0].envFrom
content:
secretRef:
name: my-existing-secret
- it: should render both inline and external refs together
set:
tap.snapshots.cloud.s3.bucket: my-bucket
tap.snapshots.cloud.s3.accessKey: key
tap.snapshots.cloud.s3.secretKey: secret
tap.snapshots.cloud.configMaps:
- ext-config
tap.snapshots.cloud.secrets:
- ext-secret
asserts:
- contains:
path: spec.template.spec.containers[0].envFrom
content:
configMapRef:
name: ext-config
- contains:
path: spec.template.spec.containers[0].envFrom
content:
secretRef:
name: ext-secret
- contains:
path: spec.template.spec.containers[0].envFrom
content:
configMapRef:
name: RELEASE-NAME-cloud-config
- contains:
path: spec.template.spec.containers[0].envFrom
content:
secretRef:
name: RELEASE-NAME-cloud-secret

View File

@@ -43,8 +43,24 @@ tap:
storageSize: 20Gi
cloud:
provider: ""
prefix: ""
configMaps: []
secrets: []
s3:
bucket: ""
region: ""
accessKey: ""
secretKey: ""
roleArn: ""
externalId: ""
azblob:
storageAccount: ""
container: ""
storageKey: ""
gcs:
bucket: ""
project: ""
credentialsJson: ""
release:
repo: https://helm.kubeshark.com
name: kubeshark
@@ -169,6 +185,7 @@ tap:
dashboard:
streamingType: connect-rpc
completeStreamingEnabled: true
clusterWideMapEnabled: false
telemetry:
enabled: true
resourceGuard:
@@ -181,7 +198,6 @@ tap:
enabled: false
environment: production
defaultFilter: ""
liveConfigMapChangesDisabled: false
globalFilter: ""
enabledDissectors:
- amqp

View File

@@ -2,6 +2,18 @@
[Kubeshark](https://kubeshark.com) MCP (Model Context Protocol) server enables AI assistants like Claude Desktop, Cursor, and other MCP-compatible clients to query real-time Kubernetes network traffic.
## AI Skills
The MCP provides the tools — [AI skills](../skills/) teach agents how to use them.
Skills turn raw MCP capabilities into domain-specific workflows like root cause
analysis, traffic filtering, and forensic investigation. See the
[skills README](../skills/README.md) for installation and usage.
| Skill | Description |
|-------|-------------|
| [`network-rca`](../skills/network-rca/) | Network Root Cause Analysis — snapshot-based retrospective investigation with PCAP and dissection routes |
| [`kfl`](../skills/kfl/) | KFL2 filter expert — write, debug, and optimize traffic queries across all supported protocols |
## Features
- **L7 API Traffic Analysis**: Query HTTP, gRPC, Redis, Kafka, DNS transactions
@@ -34,20 +46,20 @@ Add to your Claude Desktop configuration:
**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
**Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
#### URL Mode (Recommended for existing deployments)
#### Default (requires kubectl access / kube context)
```json
{
"mcpServers": {
"kubeshark": {
"command": "kubeshark",
"args": ["mcp", "--url", "https://kubeshark.example.com"]
"args": ["mcp"]
}
}
}
```
#### Proxy Mode (Requires kubectl access)
With an explicit kubeconfig path:
```json
{
@@ -59,14 +71,18 @@ Add to your Claude Desktop configuration:
}
}
```
or:
#### URL Mode (no kubectl required)
Use this when the machine doesn't have kubectl access or a kube context.
Connect directly to an existing Kubeshark deployment:
```json
{
"mcpServers": {
"kubeshark": {
"command": "kubeshark",
"args": ["mcp"]
"args": ["mcp", "--url", "https://kubeshark.example.com"]
}
}
}

120
skills/README.md Normal file
View File

@@ -0,0 +1,120 @@
# Kubeshark AI Skills
Open-source AI skills that work with the [Kubeshark MCP](https://github.com/kubeshark/kubeshark).
Skills teach AI agents how to use Kubeshark's MCP tools for specific workflows
like root cause analysis, traffic filtering, and forensic investigation.
Skills use the open [Agent Skills](https://github.com/anthropics/skills) format
and work with Claude Code, OpenAI Codex CLI, Gemini CLI, Cursor, and other
compatible agents.
## Available Skills
| Skill | Description |
|-------|-------------|
| [`network-rca`](network-rca/) | Network Root Cause Analysis. Retrospective traffic analysis via snapshots, with two investigation routes: PCAP (for Wireshark/compliance) and Dissection (for AI-driven API-level investigation). |
| [`kfl`](kfl/) | KFL2 (Kubeshark Filter Language) expert. Complete reference for writing, debugging, and optimizing CEL-based traffic filters across all supported protocols. |
## Prerequisites
All skills require the Kubeshark MCP:
```bash
# Claude Code
claude mcp add kubeshark -- kubeshark mcp
# Without kubectl access (direct URL)
claude mcp add kubeshark -- kubeshark mcp --url https://kubeshark.example.com
```
For Claude Desktop, add to `claude_desktop_config.json`:
```json
{
"mcpServers": {
"kubeshark": {
"command": "kubeshark",
"args": ["mcp"]
}
}
}
```
## Installation
### Option 1: Plugin (recommended)
Install as a Claude Code plugin directly from GitHub:
```
/plugin marketplace add kubeshark/kubeshark
/plugin install kubeshark
```
Skills appear as `/kubeshark:network-rca` and `/kubeshark:kfl`. The plugin
also bundles the Kubeshark MCP configuration automatically.
### Option 2: Clone and run
```bash
git clone https://github.com/kubeshark/kubeshark
cd kubeshark
claude
```
Skills trigger automatically based on your conversation.
### Option 3: Manual installation
Clone the repo (if you haven't already), then symlink or copy the skills:
```bash
git clone https://github.com/kubeshark/kubeshark
mkdir -p ~/.claude/skills
# Symlink to stay in sync with the repo (recommended)
ln -s kubeshark/skills/network-rca ~/.claude/skills/network-rca
ln -s kubeshark/skills/kfl ~/.claude/skills/kfl
# Or copy to your project (project scope only)
mkdir -p .claude/skills
cp -r kubeshark/skills/network-rca .claude/skills/
cp -r kubeshark/skills/kfl .claude/skills/
# Or copy for personal use (all your projects)
cp -r kubeshark/skills/network-rca ~/.claude/skills/
cp -r kubeshark/skills/kfl ~/.claude/skills/
```
## Contributing
We welcome contributions — whether improving an existing skill or proposing a new one.
- **Suggest improvements**: Open an issue or PR with changes to an existing skill's `SKILL.md`
or reference docs. Better examples, clearer workflows, and additional filter patterns
are always appreciated.
- **Add a new skill**: Open an issue describing the use case first. New skills should
follow the structure below and reference Kubeshark MCP tools by exact name.
### Skill structure
```
skills/
└── <skill-name>/
├── SKILL.md # Required. YAML frontmatter + markdown body.
└── references/ # Optional. Detailed reference docs.
└── *.md
```
### Guidelines
- Keep `SKILL.md` under 500 lines. Use `references/` for detailed content.
- Use imperative tone. Reference MCP tools by exact name.
- Include realistic example tool responses.
- The `description` frontmatter should be generous with trigger keywords.
### Planned skills
- `api-security` — OWASP API Top 10 assessment against live or snapshot traffic.
- `incident-response` — 7-phase forensic incident investigation methodology.
- `network-engineering` — Real-time traffic analysis, latency debugging, dependency mapping.

344
skills/kfl/SKILL.md Normal file
View File

@@ -0,0 +1,344 @@
---
name: kfl
user-invocable: false
description: >
KFL2 (Kubeshark Filter Language) reference. This skill MUST be loaded before
writing, constructing, or suggesting any KFL filter expression. KFL is statically
typed — incorrect field names or syntax will fail silently or error. Do not guess
at KFL syntax without this skill loaded. Trigger on any mention of KFL, CEL filters,
traffic filtering, display filters, query syntax, filter expressions, write a filter,
construct a query, build a KFL, create a filter expression, "how do I filter",
"show me only", "find traffic where", protocol-specific queries (HTTP status codes,
DNS lookups, Redis commands, Kafka topics), Kubernetes-aware filtering (by namespace,
pod, service, label, annotation), L4 connection/flow filters, time-based queries,
or any request to slice/search/narrow network traffic in Kubeshark. Also trigger
when other skills need to construct filters — KFL is the query language for all
Kubeshark traffic analysis.
---
# KFL2 — Kubeshark Filter Language
You are a KFL2 expert. KFL2 is built on Google's CEL (Common Expression Language)
and is the query language for all Kubeshark traffic analysis. It operates as a
**display filter** — it doesn't affect what's captured, only what you see.
Think of KFL the way you think of SQL for databases or Google search syntax for
the web. Kubeshark captures and indexes all cluster traffic; KFL is how you
search it.
For the complete variable and field reference, see `references/kfl2-reference.md`.
## Core Syntax
KFL expressions are boolean CEL expressions. An empty filter matches everything.
### Operators
| Category | Operators |
|----------|-----------|
| Comparison | `==`, `!=`, `<`, `<=`, `>`, `>=` |
| Logical | `&&`, `\|\|`, `!` |
| Arithmetic | `+`, `-`, `*`, `/`, `%` |
| Membership | `in` |
| Ternary | `condition ? true_val : false_val` |
### String Functions
```
str.contains(substring) // Substring search
str.startsWith(prefix) // Prefix match
str.endsWith(suffix) // Suffix match
str.matches(regex) // Regex match
size(str) // String length
```
### Collection Functions
```
size(collection) // List/map/string length
key in map // Key existence
map[key] // Value access
map_get(map, key, default) // Safe access with default
value in list // List membership
```
### Time Functions
```
timestamp("2026-03-14T22:00:00Z") // Parse ISO timestamp
duration("5m") // Parse duration
now() // Current time (snapshot at filter creation)
```
### Negation
```
!http // Everything that is NOT HTTP
http && status_code != 200 // HTTP responses that aren't 200
http && !path.contains("/health") // Exclude health checks
!(src.pod.namespace == "kube-system") // Exclude system namespace
```
## Protocol Detection
Boolean flags that indicate which protocol was detected. Use these as the first
filter term — they're fast and narrow the search space immediately.
| Flag | Protocol | Flag | Protocol |
|------|----------|------|----------|
| `http` | HTTP/1.1, HTTP/2 | `redis` | Redis |
| `dns` | DNS | `kafka` | Kafka |
| `tls` | TLS/SSL | `amqp` | AMQP |
| `tcp` | TCP | `ldap` | LDAP |
| `udp` | UDP | `ws` | WebSocket |
| `sctp` | SCTP | `gql` | GraphQL (v1+v2) |
| `icmp` | ICMP | `gqlv1` / `gqlv2` | GraphQL version-specific |
| `radius` | RADIUS | `conn` / `flow` | L4 connection/flow tracking |
| `diameter` | Diameter | `tcp_conn` / `udp_conn` | Transport-specific connections |
## Kubernetes Context
The most common starting point. Filter by where traffic originates or terminates.
### Pod and Service Fields
```
src.pod.name == "orders-594487879c-7ddxf"
dst.pod.namespace == "production"
src.service.name == "api-gateway"
dst.service.namespace == "payments"
```
Pod fields fall back to service data when pod info is unavailable, so
`dst.pod.namespace` works even for service-level entries.
### Aggregate Collections
Match against any direction (src or dst):
```
"production" in namespaces // Any namespace match
"orders" in pods // Any pod name match
"api-gateway" in services // Any service name match
```
### Labels and Annotations
```
// Direct access — works when the label is expected to exist
local_labels.app == "payment" || remote_labels.app == "payment"
// Safe access with default — use when the label may not exist
map_get(local_labels, "app", "") == "checkout"
map_get(remote_labels, "version", "") == "canary"
// Label existence check
"tier" in local_labels
```
Direct access (`local_labels.app`) returns an error if the key doesn't exist.
Use `map_get()` when you're not sure the label is present on all workloads.
Queries can be as complex as needed — combine labels with any other fields.
Responses are fast because all API elements are indexed:
```
local_labels.app == "payment" && http && status_code >= 500 && dst.pod.namespace == "production"
```
### Node and Process
```
node_name == "ip-10-0-25-170.ec2.internal"
local_process_name == "nginx"
remote_process_name.contains("postgres")
```
### DNS Resolution
```
src.dns == "api.example.com"
dst.dns.contains("redis")
```
## HTTP Filtering
HTTP is the most common protocol for API-level investigation.
### Fields
| Field | Type | Example |
|-------|------|---------|
| `method` | string | `"GET"`, `"POST"`, `"PUT"`, `"DELETE"` |
| `url` | string | Full path + query: `"/api/users?id=123"` |
| `path` | string | Path only: `"/api/users"` |
| `status_code` | int | `200`, `404`, `500` |
| `http_version` | string | `"HTTP/1.1"`, `"HTTP/2"` |
| `request.headers` | map | `request.headers["content-type"]` |
| `response.headers` | map | `response.headers["server"]` |
| `request.cookies` | map | `request.cookies["session"]` |
| `response.cookies` | map | `response.cookies["token"]` |
| `query_string` | map | `query_string["id"]` |
| `request_body_size` | int | Request body bytes |
| `response_body_size` | int | Response body bytes |
| `elapsed_time` | int | Duration in **microseconds** |
### Common Patterns
```
// Error investigation
http && status_code >= 500 // Server errors
http && status_code == 429 // Rate limiting
http && status_code >= 400 && status_code < 500 // Client errors
// Endpoint targeting
http && method == "POST" && path.contains("/orders")
http && url.matches(".*/api/v[0-9]+/users.*")
// Performance
http && elapsed_time > 5000000 // > 5 seconds
http && response_body_size > 1000000 // > 1MB responses
// Header inspection
http && "authorization" in request.headers
http && request.headers["content-type"] == "application/json"
// GraphQL (subset of HTTP)
gql && method == "POST" && status_code >= 400
```
## DNS Filtering
DNS issues are often the hidden root cause of outages.
| Field | Type | Description |
|-------|------|-------------|
| `dns_questions` | []string | Question domain names |
| `dns_answers` | []string | Answer domain names |
| `dns_question_types` | []string | Record types: A, AAAA, CNAME, MX, TXT, SRV, PTR |
| `dns_request` | bool | Is request |
| `dns_response` | bool | Is response |
| `dns_request_length` | int | Request size |
| `dns_response_length` | int | Response size |
```
dns && "api.external-service.com" in dns_questions
dns && dns_response && status_code != 0 // Failed lookups
dns && "A" in dns_question_types // A record queries
dns && size(dns_questions) > 1 // Multi-question
```
## Database and Messaging Protocols
### Redis
```
redis && redis_type == "GET" // Command type
redis && redis_key.startsWith("session:") // Key pattern
redis && redis_command.contains("DEL") // Command search
redis && redis_total_size > 10000 // Large operations
```
### Kafka
```
kafka && kafka_api_key_name == "PRODUCE" // Produce operations
kafka && kafka_client_id == "payment-processor" // Client filtering
kafka && kafka_request_summary.contains("orders") // Topic filtering
kafka && kafka_size > 10000 // Large messages
```
### AMQP, LDAP, RADIUS, Diameter
```
amqp && amqp_method == "basic.publish" // AMQP publish
ldap && ldap_type == "bind" // LDAP bind requests
radius && radius_code_name == "Access-Request" // RADIUS auth
diameter && diameter_method.contains("Credit") // Diameter credit control
```
For the full variable list for these protocols, see `references/kfl2-reference.md`.
## Transport Layer (L4)
### TCP/UDP Fields
```
tcp && tcp_error_type != "" // TCP errors
udp && udp_length > 1000 // Large UDP packets
```
### Connection Tracking
```
conn && conn_state == "open" // Active connections
conn && conn_local_bytes > 1000000 // High-volume
conn && "HTTP" in conn_l7_detected // L7 protocol detection
tcp_conn && conn_state == "closed" // Closed TCP connections
```
### Flow Tracking (with Rate Metrics)
```
flow && flow_local_pps > 1000 // High packet rate
flow && flow_local_bps > 1000000 // High bandwidth
flow && flow_state == "closed" && "TLS" in flow_l7_detected
tcp_flow && flow_local_bps > 5000000 // High-throughput TCP
```
## Network Layer
```
src.ip == "10.0.53.101"
dst.ip.startsWith("192.168.")
src.port == 8080
dst.port >= 8000 && dst.port <= 9000
```
## Time-Based Filtering
```
timestamp > timestamp("2026-03-14T22:00:00Z")
timestamp >= timestamp("2026-03-14T22:00:00Z") && timestamp <= timestamp("2026-03-14T23:00:00Z")
timestamp > now() - duration("5m") // Last 5 minutes
elapsed_time > 2000000 // Older than 2 seconds
```
## Building Filters: Progressive Narrowing
The most effective investigation technique — start broad, add constraints:
```
// Step 1: Protocol + namespace
http && dst.pod.namespace == "production"
// Step 2: Add error condition
http && dst.pod.namespace == "production" && status_code >= 500
// Step 3: Narrow to service
http && dst.pod.namespace == "production" && status_code >= 500 && dst.service.name == "payment-service"
// Step 4: Narrow to endpoint
http && dst.pod.namespace == "production" && status_code >= 500 && dst.service.name == "payment-service" && path.contains("/charge")
// Step 5: Add timing
http && dst.pod.namespace == "production" && status_code >= 500 && dst.service.name == "payment-service" && path.contains("/charge") && elapsed_time > 2000000
```
## Performance Tips
1. **Protocol flags first**`http && ...` is faster than `... && http`
2. **`startsWith`/`endsWith` over `contains`** — prefix/suffix checks are faster
3. **Specific ports before string ops**`dst.port == 80` is cheaper than `url.contains(...)`
4. **Use `map_get` for labels** — avoids errors on missing keys
5. **Keep filters simple** — CEL short-circuits on `&&`, so put cheap checks first
## Type Safety
KFL2 is statically typed. Common gotchas:
- `status_code` is `int`, not string — use `status_code == 200`, not `"200"`
- `elapsed_time` is in **microseconds** — 5 seconds = `5000000`
- `timestamp` requires `timestamp()` function — not a raw string
- Map access on missing keys errors — use `key in map` or `map_get()` first
- List membership uses `value in list` — not `list.contains(value)`

View File

@@ -0,0 +1,407 @@
# KFL2 Complete Variable and Field Reference
This is the exhaustive reference for every variable available in KFL2 filters.
KFL2 is built on Google's CEL (Common Expression Language) and evaluates against
Kubeshark's protobuf-based `BaseEntry` structure.
## Most Commonly Used Variables
These are the variables you'll reach for in 90% of investigations:
| Variable | Type | What it's for |
|----------|------|---------------|
| `status_code` | int | HTTP response status (200, 404, 500) |
| `method` | string | HTTP method (GET, POST, PUT, DELETE) |
| `path` | string | URL path without query string |
| `dst.pod.namespace` | string | Where traffic is going (namespace) |
| `dst.service.name` | string | Where traffic is going (service) |
| `src.pod.name` | string | Where traffic comes from (pod) |
| `elapsed_time` | int | Request duration in microseconds |
| `dns_questions` | []string | DNS domains being queried |
| `namespaces` | []string | All namespaces involved (src + dst) |
## Network-Level Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `src.ip` | string | Source IP address | `"10.0.53.101"` |
| `dst.ip` | string | Destination IP address | `"192.168.1.1"` |
| `src.port` | int | Source port number | `43210` |
| `dst.port` | int | Destination port number | `8080` |
| `protocol` | string | Detected protocol type | `"HTTP"`, `"DNS"` |
## Identity and Metadata Variables
| Variable | Type | Description |
|----------|------|-------------|
| `id` | int | BaseEntry unique identifier (assigned by sniffer) |
| `node_id` | string | Node identifier (assigned by hub) |
| `index` | int | Entry index for stream uniqueness |
| `stream` | string | Stream identifier (hex string) |
| `timestamp` | timestamp | Event time (UTC), use with `timestamp()` function |
| `elapsed_time` | int | Age since timestamp in microseconds |
| `worker` | string | Worker identifier |
## Cross-Reference Variables
| Variable | Type | Description |
|----------|------|-------------|
| `conn_id` | int | L7 to L4 connection cross-reference ID |
| `flow_id` | int | L7 to L4 flow cross-reference ID |
| `has_pcap` | bool | Whether PCAP data is available for this entry |
## Capture Source Variables
| Variable | Type | Description | Values |
|----------|------|-------------|--------|
| `capture_source` | string | Canonical capture source | `"unspecified"`, `"af_packet"`, `"ebpf"`, `"ebpf_tls"` |
| `capture_backend` | string | Backend family | `"af_packet"`, `"ebpf"` |
| `capture_source_code` | int | Numeric enum | 0=unspecified, 1=af_packet, 2=ebpf, 3=ebpf_tls |
| `capture` | map | Nested map access | `capture["source"]`, `capture["backend"]` |
## Protocol Detection Flags
Boolean variables indicating detected protocol. Use as first filter term for performance.
| Variable | Protocol | Variable | Protocol |
|----------|----------|----------|----------|
| `http` | HTTP/1.1, HTTP/2 | `redis` | Redis |
| `dns` | DNS | `kafka` | Kafka |
| `tls` | TLS/SSL handshake | `amqp` | AMQP messaging |
| `tcp` | TCP transport | `ldap` | LDAP directory |
| `udp` | UDP transport | `ws` | WebSocket |
| `sctp` | SCTP streaming | `gql` | GraphQL (v1 or v2) |
| `icmp` | ICMP | `gqlv1` | GraphQL v1 only |
| `radius` | RADIUS auth | `gqlv2` | GraphQL v2 only |
| `diameter` | Diameter | `conn` | L4 connection tracking |
| `flow` | L4 flow tracking | `tcp_conn` | TCP connection tracking |
| `tcp_flow` | TCP flow tracking | `udp_conn` | UDP connection tracking |
| `udp_flow` | UDP flow tracking | | |
## HTTP Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `method` | string | HTTP method | `"GET"`, `"POST"`, `"PUT"`, `"DELETE"`, `"PATCH"` |
| `url` | string | Full URL path and query string | `"/api/users?id=123"` |
| `path` | string | URL path component (no query) | `"/api/users"` |
| `status_code` | int | HTTP response status code | `200`, `404`, `500` |
| `http_version` | string | HTTP protocol version | `"HTTP/1.1"`, `"HTTP/2"` |
| `query_string` | map[string]string | Parsed URL query parameters | `query_string["id"]``"123"` |
| `request.headers` | map[string]string | Request HTTP headers | `request.headers["content-type"]` |
| `response.headers` | map[string]string | Response HTTP headers | `response.headers["server"]` |
| `request.cookies` | map[string]string | Request cookies | `request.cookies["session"]` |
| `response.cookies` | map[string]string | Response cookies | `response.cookies["token"]` |
| `request_headers_size` | int | Request headers size in bytes | |
| `request_body_size` | int | Request body size in bytes | |
| `response_headers_size` | int | Response headers size in bytes | |
| `response_body_size` | int | Response body size in bytes | |
GraphQL requests have `gql` (or `gqlv1`/`gqlv2`) set to true and all HTTP
variables available.
**Example**: `http && method == "POST" && status_code >= 500 && path.contains("/api")`
## DNS Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `dns_questions` | []string | Question domain names (request + response) | `["example.com"]` |
| `dns_answers` | []string | Answer domain names | `["1.2.3.4"]` |
| `dns_question_types` | []string | Record types in questions | `["A"]`, `["AAAA"]`, `["CNAME"]` |
| `dns_request` | bool | Is DNS request message | |
| `dns_response` | bool | Is DNS response message | |
| `dns_request_length` | int | DNS request size in bytes (0 if absent) | |
| `dns_response_length` | int | DNS response size in bytes (0 if absent) | |
| `dns_total_size` | int | Sum of request + response sizes | |
Supported question types: A, AAAA, NS, CNAME, SOA, MX, TXT, SRV, PTR, ANY.
**Example**: `dns && dns_response && status_code != 0` (failed DNS lookups)
## TLS Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `tls` | bool | TLS payload detected | |
| `tls_summary` | string | TLS handshake summary | `"ClientHello"`, `"ServerHello"` |
| `tls_info` | string | TLS connection details | `"TLS 1.3, AES-256-GCM"` |
| `tls_request_size` | int | TLS request size in bytes | |
| `tls_response_size` | int | TLS response size in bytes | |
| `tls_total_size` | int | Sum of request + response (computed if not provided) | |
## TCP Variables
| Variable | Type | Description |
|----------|------|-------------|
| `tcp` | bool | TCP payload detected |
| `tcp_method` | string | TCP method information |
| `tcp_payload` | bytes | Raw TCP payload data |
| `tcp_error_type` | string | TCP error type (empty if none) |
| `tcp_error_message` | string | TCP error message (empty if none) |
## UDP Variables
| Variable | Type | Description |
|----------|------|-------------|
| `udp` | bool | UDP payload detected |
| `udp_length` | int | UDP packet length |
| `udp_checksum` | int | UDP checksum value |
| `udp_payload` | bytes | Raw UDP payload data |
## SCTP Variables
| Variable | Type | Description |
|----------|------|-------------|
| `sctp` | bool | SCTP payload detected |
| `sctp_checksum` | int | SCTP checksum value |
| `sctp_chunk_type` | string | SCTP chunk type |
| `sctp_length` | int | SCTP chunk length |
## ICMP Variables
| Variable | Type | Description |
|----------|------|-------------|
| `icmp` | bool | ICMP payload detected |
| `icmp_type` | string | ICMP type code |
| `icmp_version` | int | ICMP version (4 or 6) |
| `icmp_length` | int | ICMP message length |
## WebSocket Variables
| Variable | Type | Description | Values |
|----------|------|-------------|--------|
| `ws` | bool | WebSocket payload detected | |
| `ws_opcode` | string | WebSocket operation code | `"text"`, `"binary"`, `"close"`, `"ping"`, `"pong"` |
| `ws_request` | bool | Is WebSocket request | |
| `ws_response` | bool | Is WebSocket response | |
| `ws_request_payload_data` | string | Request payload (safely truncated) | |
| `ws_request_payload_length` | int | Request payload length in bytes | |
| `ws_response_payload_length` | int | Response payload length in bytes | |
## Redis Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `redis` | bool | Redis payload detected | |
| `redis_type` | string | Redis command verb | `"GET"`, `"SET"`, `"DEL"`, `"HGET"` |
| `redis_command` | string | Full Redis command line | `"GET session:1234"` |
| `redis_key` | string | Key (truncated to 64 bytes) | `"session:1234"` |
| `redis_request_size` | int | Request size (0 if absent) | |
| `redis_response_size` | int | Response size (0 if absent) | |
| `redis_total_size` | int | Sum of request + response | |
**Example**: `redis && redis_type == "GET" && redis_key.startsWith("session:")`
## Kafka Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `kafka` | bool | Kafka payload detected | |
| `kafka_api_key` | int | Kafka API key number | 0=FETCH, 1=PRODUCE |
| `kafka_api_key_name` | string | Human-readable API operation | `"PRODUCE"`, `"FETCH"` |
| `kafka_client_id` | string | Kafka client identifier | `"payment-processor"` |
| `kafka_size` | int | Message size (request preferred, else response) | |
| `kafka_request` | bool | Is Kafka request | |
| `kafka_response` | bool | Is Kafka response | |
| `kafka_request_summary` | string | Request summary/topic | `"orders-topic"` |
| `kafka_request_size` | int | Request size (0 if absent) | |
| `kafka_response_size` | int | Response size (0 if absent) | |
**Example**: `kafka && kafka_api_key_name == "PRODUCE" && kafka_request_summary.contains("orders")`
## AMQP Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `amqp` | bool | AMQP payload detected | |
| `amqp_method` | string | AMQP method name | `"basic.publish"`, `"channel.open"` |
| `amqp_summary` | string | Operation summary | |
| `amqp_request` | bool | Is AMQP request | |
| `amqp_response` | bool | Is AMQP response | |
| `amqp_request_length` | int | Request length (0 if absent) | |
| `amqp_response_length` | int | Response length (0 if absent) | |
| `amqp_total_size` | int | Sum of request + response | |
## LDAP Variables
| Variable | Type | Description |
|----------|------|-------------|
| `ldap` | bool | LDAP payload detected |
| `ldap_type` | string | LDAP operation type (request preferred) |
| `ldap_summary` | string | Operation summary |
| `ldap_request` | bool | Is LDAP request |
| `ldap_response` | bool | Is LDAP response |
| `ldap_request_length` | int | Request length (0 if absent) |
| `ldap_response_length` | int | Response length (0 if absent) |
| `ldap_total_size` | int | Sum of request + response |
## RADIUS Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `radius` | bool | RADIUS payload detected | |
| `radius_code` | int | RADIUS code (request preferred) | |
| `radius_code_name` | string | Code name | `"Access-Request"` |
| `radius_request` | bool | Is RADIUS request | |
| `radius_response` | bool | Is RADIUS response | |
| `radius_request_authenticator` | string | Request authenticator (hex) | |
| `radius_request_length` | int | Request size (0 if absent) | |
| `radius_response_length` | int | Response size (0 if absent) | |
| `radius_total_size` | int | Sum of request + response | |
## Diameter Variables
| Variable | Type | Description |
|----------|------|-------------|
| `diameter` | bool | Diameter payload detected |
| `diameter_method` | string | Method name (request preferred) |
| `diameter_summary` | string | Operation summary |
| `diameter_request` | bool | Is Diameter request |
| `diameter_response` | bool | Is Diameter response |
| `diameter_request_length` | int | Request size (0 if absent) |
| `diameter_response_length` | int | Response size (0 if absent) |
| `diameter_total_size` | int | Sum of request + response |
## L4 Connection Tracking Variables
| Variable | Type | Description | Example |
|----------|------|-------------|---------|
| `conn` | bool | Connection tracking entry | |
| `conn_state` | string | Connection state | `"open"`, `"in_progress"`, `"closed"` |
| `conn_local_pkts` | int | Packets from local peer | |
| `conn_local_bytes` | int | Bytes from local peer | |
| `conn_remote_pkts` | int | Packets from remote peer | |
| `conn_remote_bytes` | int | Bytes from remote peer | |
| `conn_l7_detected` | []string | L7 protocols detected on connection | `["HTTP", "TLS"]` |
| `conn_group_id` | int | Connection group identifier | |
**Example**: `conn && conn_state == "open" && conn_local_bytes > 1000000` (high-volume open connections)
## L4 Flow Tracking Variables
Flows extend connections with rate metrics (packets/bytes per second).
| Variable | Type | Description |
|----------|------|-------------|
| `flow` | bool | Flow tracking entry |
| `flow_state` | string | Flow state (`"open"`, `"in_progress"`, `"closed"`) |
| `flow_local_pkts` | int | Packets from local peer |
| `flow_local_bytes` | int | Bytes from local peer |
| `flow_remote_pkts` | int | Packets from remote peer |
| `flow_remote_bytes` | int | Bytes from remote peer |
| `flow_local_pps` | int | Local packets per second |
| `flow_local_bps` | int | Local bytes per second |
| `flow_remote_pps` | int | Remote packets per second |
| `flow_remote_bps` | int | Remote bytes per second |
| `flow_l7_detected` | []string | L7 protocols detected on flow |
| `flow_group_id` | int | Flow group identifier |
**Example**: `tcp_flow && flow_local_bps > 5000000` (high-bandwidth TCP flows)
## Kubernetes Variables
### Pod and Service (Directional)
| Variable | Type | Description |
|----------|------|-------------|
| `src.pod.name` | string | Source pod name |
| `src.pod.namespace` | string | Source pod namespace |
| `dst.pod.name` | string | Destination pod name |
| `dst.pod.namespace` | string | Destination pod namespace |
| `src.service.name` | string | Source service name |
| `src.service.namespace` | string | Source service namespace |
| `dst.service.name` | string | Destination service name |
| `dst.service.namespace` | string | Destination service namespace |
**Fallback behavior**: Pod namespace/name fields automatically fall back to
service data when pod info is unavailable. This means `dst.pod.namespace` works
even when only service-level resolution exists.
**Example**: `src.service.name == "api-gateway" && dst.pod.namespace == "production"`
### Aggregate Collections (Non-Directional)
| Variable | Type | Description |
|----------|------|-------------|
| `namespaces` | []string | All namespaces (src + dst, pod + service) |
| `pods` | []string | All pod names (src + dst) |
| `services` | []string | All service names (src + dst) |
### Labels and Annotations
| Variable | Type | Description |
|----------|------|-------------|
| `local_labels` | map[string]string | Kubernetes labels of local peer |
| `local_annotations` | map[string]string | Kubernetes annotations of local peer |
| `remote_labels` | map[string]string | Kubernetes labels of remote peer |
| `remote_annotations` | map[string]string | Kubernetes annotations of remote peer |
Use `map_get(local_labels, "key", "default")` for safe access that won't error
on missing keys.
**Example**: `map_get(local_labels, "app", "") == "checkout" && "production" in namespaces`
### Node Information
| Variable | Type | Description |
|----------|------|-------------|
| `node` | map | Nested: `node["name"]`, `node["ip"]` |
| `node_name` | string | Node name (flat alias) |
| `node_ip` | string | Node IP (flat alias) |
| `local_node_name` | string | Node name of local peer |
| `remote_node_name` | string | Node name of remote peer |
### Process Information
| Variable | Type | Description |
|----------|------|-------------|
| `local_process_name` | string | Process name on local peer |
| `remote_process_name` | string | Process name on remote peer |
### DNS Resolution
| Variable | Type | Description |
|----------|------|-------------|
| `src.dns` | string | DNS resolution of source IP |
| `dst.dns` | string | DNS resolution of destination IP |
| `dns_resolutions` | []string | All DNS resolutions (deduplicated) |
### Resolution Status
| Variable | Type | Values |
|----------|------|--------|
| `local_resolution_status` | string | `""` (resolved), `"no_node_mapping"`, `"rpc_error"`, `"rpc_empty"`, `"cache_miss"`, `"queue_full"` |
| `remote_resolution_status` | string | Same as above |
## Default Values
When a variable is not present in an entry, KFL2 uses these defaults:
| Type | Default |
|------|---------|
| string | `""` |
| int | `0` |
| bool | `false` |
| list | `[]` |
| map | `{}` |
| bytes | `[]` |
## Protocol Variable Precedence
For protocols with request/response pairs (Kafka, RADIUS, Diameter), merged
fields prefer the **request** side. If no request exists, the response value
is used. Size totals are always computed as `request_size + response_size`.
## CEL Language Features
KFL2 supports the full CEL specification:
- **Short-circuit evaluation**: `&&` stops on first false, `||` stops on first true
- **Ternary**: `condition ? value_if_true : value_if_false`
- **Regex**: `str.matches("pattern")` uses RE2 syntax
- **Type coercion**: Timestamps require `timestamp()`, durations require `duration()`
- **Null safety**: Use `in` operator or `map_get()` before accessing map keys
For the full CEL specification, see the
[CEL Language Definition](https://github.com/google/cel-spec/blob/master/doc/langdef.md).

338
skills/network-rca/SKILL.md Normal file
View File

@@ -0,0 +1,338 @@
---
name: network-rca
description: >
Kubernetes network root cause analysis skill powered by Kubeshark MCP. Use this skill
whenever the user wants to investigate past incidents, perform retrospective traffic
analysis, take or manage traffic snapshots, extract PCAPs, dissect L7 API calls from
historical captures, compare traffic patterns over time, detect drift or anomalies
between snapshots, or do any kind of forensic network analysis in Kubernetes.
Also trigger when the user mentions snapshots, raw capture, PCAP extraction,
traffic replay, postmortem analysis, "what happened yesterday/last week",
root cause analysis, RCA, cloud snapshot storage, snapshot dissection, or KFL filters
for historical traffic. Even if the user just says "figure out what went wrong"
or "compare today's traffic to yesterday" in a Kubernetes context, use this skill.
---
# Network Root Cause Analysis with Kubeshark MCP
You are a Kubernetes network forensics specialist. Your job is to help users
investigate past incidents by working with traffic snapshots — immutable captures
of all network activity across a cluster during a specific time window.
Kubeshark is a search engine for network traffic. Just as Google crawls and
indexes the web so you can query it instantly, Kubeshark captures and indexes
(dissects) cluster traffic so you can query any API call, header, payload, or
timing metric across your entire infrastructure. Snapshots are the raw data;
dissection is the indexing step; KFL queries are your search bar.
Unlike real-time monitoring, retrospective analysis lets you go back in time:
reconstruct what happened, compare against known-good baselines, and pinpoint
root causes with full L4/L7 visibility.
## Prerequisites
Before starting any analysis, verify the environment is ready.
### Kubeshark MCP Health Check
Confirm the Kubeshark MCP is accessible and tools are available. Look for tools
like `list_api_calls`, `list_l4_flows`, `create_snapshot`, etc.
**Tool**: `check_kubeshark_status`
If tools like `list_api_calls` or `list_l4_flows` are missing from the response,
something is wrong with the MCP connection. Guide the user through setup
(see Setup Reference at the bottom).
### Raw Capture Must Be Enabled
Retrospective analysis depends on raw capture — Kubeshark's kernel-level (eBPF)
packet recording that stores traffic at the node level. Without it, snapshots
have nothing to work with.
Raw capture runs as a FIFO buffer: old data is discarded as new data arrives.
The buffer size determines how far back you can go. Larger buffer = wider
snapshot window.
```yaml
tap:
capture:
raw:
enabled: true
storageSize: 10Gi # Per-node FIFO buffer
```
If raw capture isn't enabled, inform the user that retrospective analysis
requires it and share the configuration above.
### Snapshot Storage
Snapshots are assembled on the Hub's storage, which is ephemeral by default.
For serious forensic work, persistent storage is recommended:
```yaml
tap:
snapshots:
local:
storageClass: gp2
storageSize: 1000Gi
```
## Core Workflow
Every investigation starts with a snapshot. After that, you choose one of two
investigation routes depending on your goal:
1. **Determine time window** — When did the issue occur? Use `get_data_boundaries`
to see what raw capture data is available.
2. **Create or locate a snapshot** — Either take a new snapshot covering the
incident window, or find an existing one with `list_snapshots`.
3. **Choose your investigation route** — PCAP or Dissection (see below).
### Choosing the Right Route
| | PCAP Route | Dissection Route |
|---|---|---|
| **Speed** | Immediate — no indexing needed | Takes time to index |
| **Filtering** | Nodes, time window, BPF filters | Kubernetes & API-level (pods, labels, paths, status codes) |
| **Output** | Cluster-wide PCAP files | Structured query results |
| **Investigation by** | Human (Wireshark) | AI agent or human (queryable database) |
| **Best for** | Compliance, sharing with network teams, Wireshark deep-dives | Root cause analysis, API-level debugging, automated investigation |
Both routes are valid and complementary. Use PCAP when you need raw packets
for human analysis or compliance. Use Dissection when you want an AI agent
to search and analyze traffic programmatically.
## Snapshot Operations
Both routes start here. A snapshot is an immutable freeze of all cluster traffic
in a time window.
### Check Data Boundaries
**Tool**: `get_data_boundaries`
Check what raw capture data exists across the cluster. You can only create
snapshots within these boundaries — data outside the window has been rotated
out of the FIFO buffer.
**Example response**:
```
Cluster-wide:
Oldest: 2026-03-14 16:12:34 UTC
Newest: 2026-03-14 18:05:20 UTC
Per node:
┌─────────────────────────────┬──────────┬──────────┐
│ Node │ Oldest │ Newest │
├─────────────────────────────┼──────────┼──────────┤
│ ip-10-0-25-170.ec2.internal │ 16:12:34 │ 18:03:39 │
│ ip-10-0-32-115.ec2.internal │ 16:13:45 │ 18:05:20 │
└─────────────────────────────┴──────────┴──────────┘
```
If the incident falls outside the available window, the data has been rotated
out. Suggest increasing `storageSize` for future coverage.
### Create a Snapshot
**Tool**: `create_snapshot`
Specify nodes (or cluster-wide) and a time window within the data boundaries.
Snapshots include raw capture files, Kubernetes pod events, and eBPF cgroup events.
Snapshots take time to build. Check status with `get_snapshot` — wait until
`completed` before proceeding with either route.
### List Existing Snapshots
**Tool**: `list_snapshots`
Shows all snapshots on the local Hub, with name, size, status, and node count.
### Cloud Storage
Snapshots on the Hub are ephemeral. Cloud storage (S3, GCS, Azure Blob)
provides long-term retention. Snapshots can be downloaded to any cluster
with Kubeshark — not necessarily the original one.
**Check cloud status**: `get_cloud_storage_status`
**Upload to cloud**: `upload_snapshot_to_cloud`
**Download from cloud**: `download_snapshot_from_cloud`
---
## Route 1: PCAP
The PCAP route does **not** require dissection. It works directly with the raw
snapshot data to produce filtered, cluster-wide PCAP files. Use this route when:
- You need raw packets for Wireshark analysis
- You're sharing captures with network teams
- You need evidence for compliance or audit
- A human will perform the investigation (not an AI agent)
### Filtering a PCAP
**Tool**: `export_snapshot_pcap`
Filter the snapshot down to what matters using:
- **Nodes** — specific cluster nodes only
- **Time** — sub-window within the snapshot
- **BPF filter** — standard Berkeley Packet Filter syntax (e.g., `host 10.0.53.101`,
`port 8080`, `net 10.0.0.0/16`)
These filters are combinable — select specific nodes, narrow the time range,
and apply a BPF expression all at once.
### Workload-to-BPF Workflow
When you know the workload names but not their IPs, resolve them from the
snapshot's metadata. Snapshots preserve pod-to-IP mappings from capture time,
so resolution is accurate even if pods have been rescheduled since.
**Tool**: `resolve_workload`
**Example workflow** — extract PCAP for specific workloads:
1. Resolve IPs: `resolve_workload` for `orders-594487879c-7ddxf``10.0.53.101`
2. Resolve IPs: `resolve_workload` for `payment-service-6b8f9d-x2k4p``10.0.53.205`
3. Build BPF: `host 10.0.53.101 or host 10.0.53.205`
4. Export: `export_snapshot_pcap` with that BPF filter
This gives you a cluster-wide PCAP filtered to exactly the workloads involved
in the incident — ready for Wireshark or long-term storage.
---
## Route 2: Dissection
The Dissection route indexes raw packets into structured L7 API calls, building
a queryable database from the snapshot. Use this route when:
- An AI agent is performing the investigation
- You need to search by Kubernetes context (pods, namespaces, labels, services)
- You need to search by API elements (paths, status codes, headers, payloads)
- You want structured responses you can analyze programmatically
- You need to drill into the payload of a specific API call
**KFL requirement**: The Dissection route uses KFL filters for all queries
(`list_api_calls`, `get_api_stats`, etc.). Before constructing any KFL filter,
load the KFL skill (`skills/kfl/`). KFL is statically typed — incorrect field
names or syntax will fail silently or error. If the KFL skill is not available,
suggest the user install it:
```bash
ln -s /path/to/kubeshark/skills/kfl ~/.claude/skills/kfl
```
**If the KFL skill cannot be loaded**, only use the exact filter examples shown
in this skill. Do not improvise or guess at field names, operators, or syntax.
KFL field names differ from what you might expect (e.g., `status_code` not
`response.status`, `src.pod.namespace` not `src.namespace`). Using incorrect
fields produces wrong results without warning.
### Activate Dissection
**Tool**: `start_snapshot_dissection`
Dissection takes time proportional to snapshot size — it parses every packet,
reassembles streams, and builds the index. After completion, these tools
become available:
- `list_api_calls` — Search API transactions with KFL filters
- `get_api_call` — Drill into a specific call (headers, body, timing, payload)
- `get_api_stats` — Aggregated statistics (throughput, error rates, latency)
### Investigation Strategy
Start broad, then narrow:
1. `get_api_stats` — Get the overall picture: error rates, latency percentiles,
throughput. Look for spikes or anomalies.
2. `list_api_calls` filtered by error codes (4xx, 5xx) or high latency — find
the problematic transactions.
3. `get_api_call` on specific calls — inspect headers, bodies, timing, and
full payload to understand what went wrong.
4. Use KFL filters to slice by namespace, service, protocol, or any combination.
**Example `list_api_calls` response** (filtered to `http && status_code >= 500`):
```
┌──────────────────────┬────────┬──────────────────────────┬────────┬───────────┐
│ Timestamp │ Method │ URL │ Status │ Elapsed │
├──────────────────────┼────────┼──────────────────────────┼────────┼───────────┤
│ 2026-03-14 17:23:45 │ POST │ /api/v1/orders/charge │ 503 │ 12,340 ms │
│ 2026-03-14 17:23:46 │ POST │ /api/v1/orders/charge │ 503 │ 11,890 ms │
│ 2026-03-14 17:23:48 │ GET │ /api/v1/inventory/check │ 500 │ 8,210 ms │
│ 2026-03-14 17:24:01 │ POST │ /api/v1/payments/process │ 502 │ 30,000 ms │
└──────────────────────┴────────┴──────────────────────────┴────────┴───────────┘
Src: api-gateway (prod) → Dst: payment-service (prod)
```
Use the pattern of repeated failures and high latency to identify the failing
service chain, then drill into individual calls with `get_api_call`.
### KFL Filters for Dissected Traffic
Layer filters progressively when investigating:
```
// Step 1: Protocol + namespace
http && dst.pod.namespace == "production"
// Step 2: Add error condition
http && dst.pod.namespace == "production" && status_code >= 500
// Step 3: Narrow to service
http && dst.pod.namespace == "production" && status_code >= 500 && dst.service.name == "payment-service"
// Step 4: Narrow to endpoint
http && dst.pod.namespace == "production" && status_code >= 500 && dst.service.name == "payment-service" && path.contains("/charge")
```
Other common RCA filters:
```
dns && dns_response && status_code != 0 // Failed DNS lookups
src.service.namespace != dst.service.namespace // Cross-namespace traffic
http && elapsed_time > 5000000 // Slow transactions (> 5s)
conn && conn_state == "open" && conn_local_bytes > 1000000 // High-volume connections
```
---
## Combining Both Routes
The two routes are complementary. A common pattern:
1. Start with **Dissection** — let the AI agent search and identify the root cause
2. Once you've pinpointed the problematic workloads, use `resolve_workload`
to get their IPs
3. Switch to **PCAP** — export a filtered PCAP of just those workloads for
Wireshark deep-dive, sharing with the network team, or compliance archival
## Use Cases
### Post-Incident RCA
1. Identify the incident time window from alerts, logs, or user reports
2. Check `get_data_boundaries` — is the window still in raw capture?
3. `create_snapshot` covering the incident window (add 15 minutes buffer)
4. **Dissection route**: `start_snapshot_dissection``get_api_stats`
`list_api_calls``get_api_call` → follow the dependency chain
5. **PCAP route**: `resolve_workload``export_snapshot_pcap` with BPF →
hand off to Wireshark or archive
### Other Use Cases
- **Trend analysis** — Take snapshots at regular intervals and compare
`get_api_stats` across them to detect latency drift, error rate changes,
or new service-to-service connections.
- **Forensic preservation** — `create_snapshot` + `upload_snapshot_to_cloud`
for immutable, long-term evidence. Downloadable to any cluster months later.
- **Production-to-local replay** — Upload a production snapshot to cloud,
download it on a local KinD cluster, and investigate safely.
## Setup Reference
For CLI installation, MCP configuration, verification, and troubleshooting,
see `references/setup.md`.

View File

@@ -0,0 +1,70 @@
# Kubeshark MCP Setup Reference
## Installing the CLI
**Homebrew (macOS)**:
```bash
brew install kubeshark
```
**Linux**:
```bash
sh <(curl -Ls https://kubeshark.com/install)
```
**From source**:
```bash
git clone https://github.com/kubeshark/kubeshark
cd kubeshark && make
```
## MCP Configuration
**Claude Desktop / Cowork** (`claude_desktop_config.json`):
```json
{
"mcpServers": {
"kubeshark": {
"command": "kubeshark",
"args": ["mcp"]
}
}
}
```
**Claude Code (CLI)**:
```bash
claude mcp add kubeshark -- kubeshark mcp
```
**Without kubectl access** (direct URL mode):
```json
{
"mcpServers": {
"kubeshark": {
"command": "kubeshark",
"args": ["mcp", "--url", "https://kubeshark.example.com"]
}
}
}
```
```bash
# Claude Code equivalent:
claude mcp add kubeshark -- kubeshark mcp --url https://kubeshark.example.com
```
## Verification
- Claude Code: `/mcp` to check connection status
- Terminal: `kubeshark mcp --list-tools`
- Cluster: `kubectl get pods -l app=kubeshark-hub`
## Troubleshooting
- **Binary not found** → Install via Homebrew or the install script above
- **Connection refused** → Deploy Kubeshark first: `kubeshark tap`
- **No L7 data** → Check `get_dissection_status` and `enable_dissection`
- **Snapshot creation fails** → Verify raw capture is enabled in Kubeshark config
- **Empty snapshot** → Check `get_data_boundaries` — the requested window may
fall outside available data