* feat(serve): add short flag and env var for metrics port
Add short flag -m for --metrics-port to improve discoverability.
Add K8SGPT_METRICS_PORT environment variable support, consistent
with other K8SGPT_* environment variables.
This helps users who encounter port conflicts on the default
metrics port (8081) when running k8sgpt serve with --mcp or
other configurations.
Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxes53235@gmail.com>
Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
* fix: validate namespace before running custom analyzers
Custom analyzers previously ignored the --namespace flag entirely,
executing even when an invalid or misspelled namespace was provided.
This was inconsistent with built-in filter behavior, which respects
namespace scoping.
Add namespace existence validation in RunCustomAnalysis() before
executing custom analyzers. If a namespace is specified but does
not exist, an error is reported and custom analyzers are skipped.
Fixes#1601
Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxes53235@gmail.com>
Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
---------
Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxes53235@gmail.com>
Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
Co-authored-by: Alex Jones <1235925+AlexsJones@users.noreply.github.com>
Add short flag -m for --metrics-port to improve discoverability.
Add K8SGPT_METRICS_PORT environment variable support, consistent
with other K8SGPT_* environment variables.
This helps users who encounter port conflicts on the default
metrics port (8081) when running k8sgpt serve with --mcp or
other configurations.
Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxes53235@gmail.com>
Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxesyes3inatrenchcoat@gmail.com>
This fixes issue #1556 where the customrest backend fails when error messages
contain quotes or other special characters.
The root cause was that fmt.Sprintf was used to construct JSON, which doesn't
escape special characters like quotes, newlines, or tabs. When Kubernetes error
messages contain image names with quotes (e.g., "nginx:1.a.b.c"), the resulting
JSON was malformed and failed to parse.
The fix uses json.Marshal to properly construct the JSON payload, which
automatically handles all special character escaping according to JSON spec.
Fixes#1556
Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxes53235@gmail.com>
Co-authored-by: Three Foxes (in a Trenchcoat) <threefoxes53235@gmail.com>
* refactor: improve MCP server handlers with better error handling and pagination
This PR refactors the MCP server handler functions to improve code quality,
maintainability, and user experience.
## Key Improvements
### 1. Eliminated Code Duplication
- Introduced a **resource registry pattern** that maps resource types to their
list and get functions
- Reduced ~500 lines of repetitive switch-case statements to ~100 lines of
declarative registry configuration
- Makes adding new resource types trivial (just add to the registry)
### 2. Proper Error Handling
- Fixed all ignored JSON marshaling errors (previously using `_`)
- Added `marshalJSON()` helper function with explicit error handling
- Improved error messages with context about what failed
### 3. Input Validation
- Added required field validation (resourceType, name, namespace where needed)
- Returns clear error messages when required fields are missing
- Validates resource types before attempting operations
### 4. Pagination Support
- Added `limit` parameter to `list-resources` handler
- Defaults to 100 items, max 1000 (configurable via constants)
- Prevents returning massive amounts of data that could overwhelm clients
- Consistent with `list-events` handler which already had limits
### 5. Resource Type Normalization
- Added `normalizeResourceType()` function to handle aliases (pods->pod, svc->service, etc.)
- Centralized resource type validation
- Better error messages listing supported resource types
### 6. Improved Filter Management
- Added validation to ensure filters array is not empty
- Better feedback messages (e.g., "filters already active", "no filters removed")
- Tracks which filters were actually added/removed
## Technical Details
**Constants Added:**
- `DefaultListLimit = 100` - Default max resources to return
- `MaxListLimit = 1000` - Hard limit for list operations
**New Functions:**
- `normalizeResourceType()` - Converts aliases to canonical types
- `marshalJSON()` - Marshals with proper error handling
**Registry Pattern:**
- `resourceRegistry` - Maps resource types to list/get functions
- `resourceTypeAliases` - Maps aliases to canonical types
## Backward Compatibility
All changes are backward compatible:
- No API changes to tool signatures
- Existing clients will work without modification
- New `limit` parameter is optional (defaults to 100)
## Testing
Tested with:
- All resource types (pods, deployments, services, nodes, etc.)
- Various aliases (svc, cm, pvc, sts, ds, rs)
- Edge cases (missing required fields, invalid resource types)
- Large result sets (pagination working correctly)
Fixes code duplication and improves maintainability of the MCP server.
Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxes53235@gmail.com>
* fix: remove duplicate mcp_handlers_old.go file causing build failures
The old handlers file was accidentally left in place after refactoring,
causing 'redeclared' errors for all handler methods. This commit removes
the old file to resolve the build failures.
Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxes53235@gmail.com>
---------
Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxes53235@gmail.com>
Co-authored-by: Three Foxes (in a Trenchcoat) <threefoxes53235@gmail.com>
Co-authored-by: Alex Jones <1235925+AlexsJones@users.noreply.github.com>
Fixes#1610
The CI workflows were using inconsistent Go versions (1.22, 1.23) that
didn't match go.mod (go 1.24.1, toolchain go1.24.11). This creates
confusion for contributors and risks version-specific issues.
Changes:
- test.yaml: GO_VERSION ~1.22 -> ~1.24
- build_container.yaml: GO_VERSION ~1.23 -> ~1.24
- release.yaml: go-version 1.22 -> ~1.24
This aligns with PR #1609 which updates CONTRIBUTING.md to reflect
go.mod's Go 1.24 requirement.
Signed-off-by: Three Foxes (in a Trenchcoat) <threefoxes53235@gmail.com>
Co-authored-by: Three Foxes (in a Trenchcoat) <threefoxes53235@gmail.com>
Add Groq as a new AI backend provider. Groq provides an OpenAI-compatible
API, so this implementation reuses the existing OpenAI client library
with Groq's API endpoint.
Closes#1269🤖 Generated with [Claude Code](https://claude.com/claude-code)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* migrated to more actively maintained mcp golang lib and added AI explain support for mcp mode
Signed-off-by: Umesh Kaul <umeshkaul@gmail.com>
* added a makefile option to create local docker image for testing
Signed-off-by: Umesh Kaul <umeshkaul@gmail.com>
* fixed linter errors and made anonymize as an arg
Signed-off-by: Umesh Kaul <umeshkaul@gmail.com>
* added mcp support for helm chart and fixed google adk support issue
Signed-off-by: Umesh Kaul <umeshkaul@gmail.com>
---------
Signed-off-by: Umesh Kaul <umeshkaul@gmail.com>
Co-authored-by: Alex Jones <1235925+AlexsJones@users.noreply.github.com>
* migrated to more actively maintained mcp golang lib and added AI explain support for mcp mode
Signed-off-by: Umesh Kaul <umeshkaul@gmail.com>
* added a makefile option to create local docker image for testing
Signed-off-by: Umesh Kaul <umeshkaul@gmail.com>
* fixed linter errors and made anonymize as an arg
Signed-off-by: Umesh Kaul <umeshkaul@gmail.com>
---------
Signed-off-by: Umesh Kaul <umeshkaul@gmail.com>
Co-authored-by: Alex Jones <alexsimonjones@gmail.com>
* fix(deps): update module gopkg.in/yaml.v2 to v3
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
* chore: resolved conflict in deps
Signed-off-by: Alex Jones <alexsimonjones@gmail.com>
---------
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Signed-off-by: Alex Jones <alexsimonjones@gmail.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Alex Jones <alexsimonjones@gmail.com>
* fix: update OpenAI API key generation URL to reflect new platform link
Updated the outdated URL 'https://beta.openai.com/account/api-keys' to the current OpenAI API key generation page 'https://platform.openai.com/account/api-keys'.
This resolves the issue where users were directed to an incorrect URL when generating an OpenAI API key.
Signed-off-by: 100daysofdevops <47483190+100daysofdevops@users.noreply.github.com>
* fix(deps):Add transition plan for GPT-3.5 Turbo to GPT-4o
- A comprehensive comparison of GPT-3.5 Turbo and GPT-4o models, focusing on performance and cost improvements.
- Documentation updates highlighting the planned deprecation of gpt-3.5-turbo-0301 on February 13, 2025.
- Clear migration guidelines for transitioning to GPT-4o or GPT-4o mini to ensure service continuity.
Signed-off-by: 100daysofdevops <47483190+100daysofdevops@users.noreply.github.com>
---------
Signed-off-by: 100daysofdevops <47483190+100daysofdevops@users.noreply.github.com>
Co-authored-by: Alex Jones <alexsimonjones@gmail.com>
* feat: rework to how bedrock data models are structured and accessed
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
* feat: rework to how bedrock data models are structured and accessed
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
---------
Signed-off-by: AlexsJones <alexsimonjones@gmail.com>
If there is an issue in creating the Analysis config when calling
analysis.NewAnalysis, then we want to check before assigning the context to a
potentially nil pointer.
Signed-off-by: Danny Clark <danielclark@google.com>
Co-authored-by: Alex Jones <alexsimonjones@gmail.com>
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Alex Jones <alexsimonjones@gmail.com>
* feat: add stats option to analyze command for performance insights
Introduced a new feature to the analyze command that enables users to print detailed performance statistics of each analyzer. This enhancement aids in debugging and understanding the time taken by various components during analysis, providing valuable insights for performance optimization.
Signed-off-by: Matthis Holleville <matthish29@gmail.com>
* feat: enhance analysis command with statistics option
Refactored the analysis command to support an enhanced statistics option, enabling users to opt-in for detailed performance metrics of the analysis process. This change introduces a more flexible approach to handling statistics, allowing for a clearer separation between the analysis output and performance metrics, thereby improving the usability and insights provided to the user.
Signed-off-by: Matthis Holleville <matthish29@gmail.com>
---------
Signed-off-by: Matthis Holleville <matthish29@gmail.com>
Co-authored-by: Alex Jones <alexsimonjones@gmail.com>
Co-authored-by: Aris Boutselis <arisboutselis08@gmail.com>
* feat: added support for A21 and Amazon Titan models via bedrock api
Signed-off-by: Yomesh Shah <yomesh@gmail.com>
* fix: response type for diffrent models and use of constant for top_P
Signed-off-by: Yomesh Shah <yomesh@gmail.com>
* fix: constant for top_P as int vs string
Signed-off-by: Yomesh Shah <yomesh@gmail.com>
* feat: moved topP and maxTokens to config rather than being constants in the code
Signed-off-by: Yomesh Shah <yomesh@gmail.com>
---------
Signed-off-by: Yomesh Shah <yomesh@gmail.com>
Co-authored-by: Alex Jones <alexsimonjones@gmail.com>
* feat: add custom analyzer management capability
Introduced the ability to manage custom analyzers in the K8sGPT application, enabling users to add, deploy, and configure custom analyzers from various sources. This enhancement supports extending the application's analytical capabilities by integrating external analysis tools, thus offering more flexibility and customization options to meet specific user needs.
Signed-off-by: Matthis Holleville <matthish29@gmail.com>
* feat: enhance custom analyzer management with removal functionality
Introduced the ability to remove custom analyzers, streamlining the management process and ensuring flexibility in custom analyzer configuration. This enhancement addresses the need for dynamic customization and maintenance of analyzer setups, facilitating easier updates and modifications to the analysis environment.
Signed-off-by: Matthis Holleville <matthish29@gmail.com>
* feat: add list command to customAnalyzer for displaying configured analyzers
Implemented a new list command within the customAnalyzer module to enable users to view all configured custom analyzers. This enhancement aims to improve usability by providing a straightforward method for users to inspect their custom analyzer configurations directly from the command line.
Signed-off-by: Matthis Holleville <matthish29@gmail.com>
* feat: add support for listing, adding, and removing custom analyzers
This update introduces commands to manage custom analyzers in the k8sgpt tool, enhancing flexibility and control over analyzer configurations without the need for direct installation or docker dependency.
Signed-off-by: Matthis Holleville <matthish29@gmail.com>
* feat: support private docker image authentication for custom analyzers
Added authentication support for pulling private Docker images when adding custom analyzers, enhancing security and access control.
Signed-off-by: Matthis Holleville <matthish29@gmail.com>
* feat: remove Docker custom analyzer installation
Removed the installation and deployment functionality for custom analyzers, streamlining the process of adding analyzers. This change focuses on simplifying the configuration by eliminating the need for specifying installation types, package URLs, and authentication details for Docker images. The goal is to enhance user experience by making the addition of custom analyzers more straightforward and less error-prone.
Signed-off-by: Matthis Holleville <matthish29@gmail.com>
* fix: remove unused packageUrl
Signed-off-by: Matthis Holleville <matthish29@gmail.com>
* feat: update add command description to reflect broader functionality
Signed-off-by: Matthis Holleville <matthish29@gmail.com>
* feat: Add name validation for custom analyzer creation
To ensure the integrity and consistency of analyzer names, we introduced a validation step that checks the format of the name against a predefined regex pattern. This change aims to prevent the creation of analyzers with invalid names, enhancing the system's reliability and usability.
Signed-off-by: Matthis Holleville <matthish29@gmail.com>
* feat: refactor customAnalyzer package for consistent naming
Refactored the customAnalyzer package and its references to use consistent snake_case naming for improved code readability and alignment with Go naming conventions.
Signed-off-by: Matthis Holleville <matthish29@gmail.com>
---------
Signed-off-by: Matthis Holleville <matthish29@gmail.com>
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Alex Jones <alexsimonjones@gmail.com>
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Alex Jones <alexsimonjones@gmail.com>
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Alex Jones <alexsimonjones@gmail.com>
This commit adds Top-K sampling, a feature that allows users to control
the randomness of the generated text by specifying the number of most
probable next words considered by the model. This enhances user control
and potentially improves the quality of the generated outputs.
Fixes: https://github.com/k8sgpt-ai/k8sgpt/issues/1105
Signed-off-by: VaibhavMalik4187 <vaibhavmalik2018@gmail.com>
Co-authored-by: Alex Jones <alexsimonjones@gmail.com>
This commit adds new tests for the `pkg/integration` package. As a
result, the code the code coverage of the package has increased from 0%
to 100%
This also includes a minor adjustment in the error statements of the
`Activate` and `Deactive` functions to ensure better understanding of
the cause of the error.
Signed-off-by: VaibhavMalik4187 <vaibhavmalik2018@gmail.com>
Co-authored-by: Alex Jones <alexsimonjones@gmail.com>
* chore(deps): update cohere client implementation to v2 and to use chat endpoint
Signed-off-by: Miguel Varela Ramos <miguel@cohere.ai>
* chore: remove renovate rule for cohere-go
Signed-off-by: Miguel Varela Ramos <miguel@cohere.ai>
* style: remove unused attribute
Signed-off-by: Miguel Varela Ramos <miguel@cohere.ai>
* fix: go mod
Signed-off-by: Miguel Varela Ramos <miguel@cohere.ai>
---------
Signed-off-by: Miguel Varela Ramos <miguel@cohere.ai>
Signed-off-by: Miguel Varela Ramos <miguelvramos92@gmail.com>
Co-authored-by: Alex Jones <alexsimonjones@gmail.com>
- Fixed a small bug where failures were being appended multiple times
for CrashLoopBackOff and ContainerCreating container status reasons.
- Added missing test cases to ensure proper testing of the Pod analyzer.
The addition of these missing test cases has increased the code
coverage of this analyzer to 98%.
- Added checks for init containers in a pod.
Partially addresses: https://github.com/k8sgpt-ai/k8sgpt/issues/889
Signed-off-by: VaibhavMalik4187 <vaibhavmalik2018@gmail.com>
* chore: allows an environmental override of the default AWS region and using it for bedrock
Signed-off-by: Alex Jones <alexsimonjones@gmail.com>
* chore: missing provider region
Signed-off-by: Alex Jones <alexsimonjones@gmail.com>
---------
Signed-off-by: Alex Jones <alexsimonjones@gmail.com>
* test: added missing tests for the CronJob analyzer
- Fixed a small bug where pre-analysis was incorrectly appended to the
results every time at the end of the for loop. This caused the result
for a single cronjob failure to be appended multiple times in the
final results.
- Added missing test cases to ensure proper testing of the CronJob
analyzer. The addition of these missing test cases has increased the
code coverage of this analyzer to over 96%.
Partially Addresses: https://github.com/k8sgpt-ai/k8sgpt/issues/889
Signed-off-by: VaibhavMalik4187 <vaibhavmalik2018@gmail.com>
* test: removed failure strings matching from tests
It is possible that the error or failure strings might change in the
future, causing the tests to fail. This commit addresses that issue by
removing the matching of failure text from various analyzer tests.
Signed-off-by: VaibhavMalik4187 <vaibhavmalik2018@gmail.com>
---------
Signed-off-by: VaibhavMalik4187 <vaibhavmalik2018@gmail.com>
- This commit removes unnecessary tests defined in the pkg/kubernetes
package.
- The removed tests were found to be flaky and were causing a
significant increase in CI time without adding much value to
the codebase.
Signed-off-by: VaibhavMalik4187 <vaibhavmalik2018@gmail.com>
* Added new tests for the `Service` analyzer defined in the
`pkg/analyzer` package.
* The addition of these new tests has increased the code coverage of the
service.go file to over 97%.
* Additionally addressed some flaky tests related to the `ReplicaSet`and
`PersisentVolumeClaim` analyzers.
Partially addresses: https://github.com/k8sgpt-ai/k8sgpt/issues/889
Signed-off-by: VaibhavMalik4187 <vaibhavmalik2018@gmail.com>
Co-authored-by: Aris Boutselis <arisboutselis08@gmail.com>
- Removed test cases which required access to `/root` from the
`pkg/util` package.
- Fixed flaky `PodDisruptionBudget` test.
- Fixed a typo in `PersistentVolumeClaim` test.
Signed-off-by: VaibhavMalik4187 <vaibhavmalik2018@gmail.com>
This commit introduces comprehensive tests for the
`PersistentVolumeClaim` analyzer defined in the `pkg/analyzer` package.
Adding these tests increases the code coverage of the `pvc.go` file to
>95%.
I also made minor modifications to the ReplicaSet test to ensure all
expectations were met.
Partially addresses: https://github.com/k8sgpt-ai/k8sgpt/issues/889
Signed-off-by: VaibhavMalik4187 <vaibhavmalik2018@gmail.com>
This commit introduces comprehensive tests for the `PodDisruptionBudget`
analyzer defined in the `pkg/analyzer` package.
Adding these tests increases the code coverage of the `pdb.go` file to
>96%.
Additionally, a potential crash in case of empty or nil PDB status
conditions has been addressed.
Partially addresses: https://github.com/k8sgpt-ai/k8sgpt/issues/889
Signed-off-by: VaibhavMalik4187 <vaibhavmalik2018@gmail.com>
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Alex Jones <alexsimonjones@gmail.com>
This commit introduces comprehensive tests for the mutating webhook
analyzer defined in the `pkg/analyzer` package.
Adding these tests increases the code coverage of the
`mutating_webhook.go` file to almost 95%.
Partially addresses: https://github.com/k8sgpt-ai/k8sgpt/issues/889
Signed-off-by: VaibhavMalik4187 <vaibhavmalik2018@gmail.com>
This commit introduces comprehensive tests for the ReplicaSet analyzer
defined in the `pkg/analyzer` package.
Adding these tests increases the code coverage of the `rs.go` file to
>95%.
Partially addresses: https://github.com/k8sgpt-ai/k8sgpt/issues/889
Signed-off-by: VaibhavMalik4187 <vaibhavmalik2018@gmail.com>
This commit introduces comprehensive tests for the validating webhook
analyzer defined in the `pkg/analyzer` package.
Adding these tests increases the code coverage of the
`validating_webhook.go` file to almost 95%.
Partially addresses: https://github.com/k8sgpt-ai/k8sgpt/issues/889
Signed-off-by: VaibhavMalik4187 <vaibhavmalik2018@gmail.com>
Now, the default value of the `backend` flag for the analyze command
will be an empty string. And the `NewAnalysis` function has been
modified to use the default backend set by the user if the backend flag
is not provided and the `defaultprovider` is set in the config file.
Otherwise, backend will be set to "openai".
Fixes: https://github.com/k8sgpt-ai/k8sgpt/issues/902
Signed-off-by: VaibhavMalik4187 <vaibhavmalik2018@gmail.com>
Co-authored-by: JuHyung Son <sonju0427@gmail.com>
This commit adds new unit tests for the `pkg/util` package bumping the
code coverage to 84%
Signed-off-by: VaibhavMalik4187 <vaibhavmalik2018@gmail.com>
Co-authored-by: Alex Jones <alexsimonjones@gmail.com>
Removed the shorthand for the `http` flag in the serve command because
it was contradicting with the shorthand of the `help` command which is
automatically added on execution if the `help` flag is not already
defined.
Fixes: https://github.com/k8sgpt-ai/k8sgpt/issues/968
Signed-off-by: VaibhavMalik4187 <vaibhavmalik2018@gmail.com>
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Alex Jones <alexsimonjones@gmail.com>
* chore: linting improvements and catching false positives
Signed-off-by: Alex Jones <alexsimonjones@gmail.com>
* chore: linting improvements and catching false positives
Signed-off-by: Alex Jones <alexsimonjones@gmail.com>
* chore: linting improvements and catching false positives
Signed-off-by: Alex Jones <alexsimonjones@gmail.com>
* chore: increase linter time out
Signed-off-by: Alex Jones <alexsimonjones@gmail.com>
---------
Signed-off-by: Alex Jones <alexsimonjones@gmail.com>
* feat: initial Prometheus analyzers
Added a prometheus integration with two analyzers:
1. PrometheusConfigValidate
2. PrometheusConfigRelabelReport
The integration does not deploy any Prometheus stack in the cluster.
Instead, it searches the provided --namespace for a Prometheus
configuration, stored in a ConfigMap or Secret. If it finds one, it
unmarshals it into memory and runs the analyzers on it.
PrometheusConfigValidate checks if the actual Prometheus configuration is valid or has
any errors.
PrometheusConfigRelabelReport tries to distill the scrape config
relabeling rules to give a concise label set per job that targets need
to have to be scraped. This analyzer is unconventional, in that it does
not necessarily mean there are issues with the config. It merely tries
to give a human-readable explanation of the relabel rules it discovers,
leaning on the LLM and prompt.
Tested on both kube-prometheus and Google Managed Prometheus
stacks.
Signed-off-by: Daniel Clark <danielclark@google.com>
* review: feedback cycle 1
Simplify ConfigValidate prompt and add comments.
Signed-off-by: Daniel Clark <danielclark@google.com>
* review: feedback cycle 2
Add Prometheus configuration discovery to integration activate command.
Also improve logging to make this more clear to users.
Signed-off-by: Daniel Clark <danielclark@google.com>
---------
Signed-off-by: Daniel Clark <danielclark@google.com>
* feat: openAI explicit value for maxToken and temp
Because when k8sgpt talks with vLLM, the default MaxToken is 16,
which is so small.
Given the most model supports 2048 token(like Llama1 ..etc), so
put here for a safe value.
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
* feat: make temperature a flag
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
---------
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
* feat: show each ConfigAuditReport check
Signed-off-by: Johannes Kleinlercher <johannes@kleinlercher.at>
* feat: mask sensitive data in configauditreport messages
Signed-off-by: Johannes Kleinlercher <johannes@kleinlercher.at>
---------
Signed-off-by: Johannes Kleinlercher <johannes@kleinlercher.at>
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Alex Jones <alexsimonjones@gmail.com>
If user specify `--kubeconfig` when running k8sgpt, it should use the
kubeconfig file to login the corresponding cluster instead of getting auth info via SA.
Closes#604
Signed-off-by: Jian Zhang <jiazha@redhat.com>
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Alex Jones <alexsimonjones@gmail.com>
K8sGPT is a tool for scanning your kubernetes clusters, diagnosing and triaging issues in simple english. It has SRE experience codified into it’s analyzers and helps to pull out the most relevant information to enrich it with AI.
license:"MIT"
license:"Apache-2.0"
formats:
- deb
- rpm
@@ -32,7 +35,7 @@ nfpms:
section:utils
contents:
- src:./LICENSE
dst:/usr/share/doc/nfpm/copyright
dst:/usr/share/doc/k8sgpt/copyright
file_info:
mode:0644
@@ -51,26 +54,44 @@ archives:
{{- if .Arm }}v{{ .Arm }}{{ end }}
# use zip for windows archives
format_overrides:
- goos:windows
format:zip
- goos:windows
format:zip
brews:
- name:k8sgpt
homepage:https://k8sgpt.ai
tap:
repository:
owner:k8sgpt-ai
name:homebrew-k8sgpt
checksum:
name_template:'checksums.txt'
name_template:"checksums.txt"
snapshot:
name_template:"{{ incpatch .Version }}-next"
changelog:
skip:true
announce:
slack:
# Whether its enabled or not.
#
# Templates: allowed (since v2.6).
enabled:true
# Message template to use while publishing.
#
# Default: '{{ .ProjectName }} {{ .Tag }} is out! Check it out at {{ .ReleaseURL }}'.
# Templates: allowed.
message_template:"{{ .ProjectName }} release {{.Tag}} is out!"
# The name of the channel that the user selected as a destination for webhook messages.
channel:"#general"
# Set your Webhook's user name.
username:"K8sGPT"
# Emoji to use as the icon for this message. Overrides icon_url.
icon_emoji:""
# URL to an image to use as the icon for this message.
icon_url:""
# The lines beneath this are called `modelines`. See `:help modeline`
# Feel free to remove those if you don't want/use them.
K8sGPT provides a Model Context Protocol (MCP) server that exposes Kubernetes cluster operations as standardized tools, resources, and prompts for AI assistants like Claude, ChatGPT, and other MCP-compatible clients.
## Table of Contents
- [What is MCP?](#what-is-mcp)
- [Quick Start](#quick-start)
- [Server Modes](#server-modes)
- [Available Tools](#available-tools)
- [Available Resources](#available-resources)
- [Available Prompts](#available-prompts)
- [Usage Examples](#usage-examples)
- [Integration with AI Assistants](#integration-with-ai-assistants)
- [HTTP API Reference](#http-api-reference)
## What is MCP?
The Model Context Protocol (MCP) is an open standard that enables AI assistants to securely connect to external data sources and tools. K8sGPT's MCP server exposes Kubernetes operations through this standardized interface, allowing AI assistants to:
- Analyze cluster health and issues
- Query Kubernetes resources
- Access pod logs and events
- Get troubleshooting guidance
- Manage analyzer filters
## Quick Start
### Start the MCP Server
**Stdio mode (for local AI assistants):**
```bash
k8sgpt serve --mcp
```
**HTTP mode (for network access):**
```bash
k8sgpt serve --mcp --mcp-http --mcp-port 8089
```
### Test with curl
```bash
curl -X POST http://localhost:8089/mcp \
-H "Content-Type: application/json"\
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/list"
}'
```
## Server Modes
### Stdio Mode (Default)
Used by local AI assistants like Claude Desktop:
```bash
k8sgpt serve --mcp
```
Configure in your MCP client (e.g., Claude Desktop's `claude_desktop_config.json`):
```json
{
"mcpServers":{
"k8sgpt":{
"command":"k8sgpt",
"args":["serve","--mcp"]
}
}
}
```
### HTTP Mode
Used for network access and webhooks:
```bash
k8sgpt serve --mcp --mcp-http --mcp-port 8089
```
The server runs in stateless mode, so no session management is required. Each request is independent.
## Available Tools
The MCP server exposes 12 tools for Kubernetes operations:
### Cluster Analysis
**analyze**
- Analyze Kubernetes resources for issues and problems
- Parameters:
-`namespace` (optional): Namespace to analyze
-`explain` (optional): Get AI explanations for issues
-`filters` (optional): Comma-separated list of analyzers to use
**cluster-info**
- Get Kubernetes cluster information and version
### Resource Management
**list-resources**
- List Kubernetes resources of a specific type
- Parameters:
-`resourceType` (required): Type of resource (pods, deployments, services, nodes, jobs, cronjobs, statefulsets, daemonsets, replicasets, configmaps, secrets, ingresses, pvcs, pvs)
-`namespace` (optional): Namespace to query
-`labelSelector` (optional): Label selector for filtering
**get-resource**
- Get detailed information about a specific Kubernetes resource
- Parameters:
-`resourceType` (required): Type of resource
-`name` (required): Resource name
-`namespace` (optional): Namespace
**list-namespaces**
- List all namespaces in the cluster
### Debugging and Troubleshooting
**get-logs**
- Get logs from a pod container
- Parameters:
-`podName` (required): Name of the pod
-`namespace` (optional): Namespace
-`container` (optional): Container name
-`tail` (optional): Number of lines to show
-`previous` (optional): Show logs from previous container instance
-`sinceSeconds` (optional): Show logs from last N seconds
**list-events**
- List Kubernetes events for debugging
- Parameters:
-`namespace` (optional): Namespace to query
-`involvedObjectName` (optional): Filter by object name
-`involvedObjectKind` (optional): Filter by object kind
### Analyzer Management
**list-filters**
- List all available and active analyzers/filters
**add-filters**
- Add filters to enable specific analyzers
- Parameters:
-`filters` (required): Comma-separated list of analyzer names
**remove-filters**
- Remove filters to disable specific analyzers
- Parameters:
-`filters` (required): Comma-separated list of analyzer names
### Integrations
**list-integrations**
- List available integrations (Prometheus, AWS, Keda, Kyverno, etc.)
### Configuration
**config**
- Configure K8sGPT settings including custom analyzers and cache
## Available Resources
Resources provide read-only access to cluster information:
**cluster-info**
- URI: `cluster-info`
- Get information about the Kubernetes cluster
**namespaces**
- URI: `namespaces`
- List all namespaces in the cluster
**active-filters**
- URI: `active-filters`
- Get currently active analyzers/filters
## Available Prompts
Prompts provide guided troubleshooting workflows:
**troubleshoot-pod**
- Interactive pod debugging workflow
- Arguments:
-`podName` (required): Name of the pod to troubleshoot
-`namespace` (required): Namespace of the pod
**troubleshoot-deployment**
- Interactive deployment debugging workflow
- Arguments:
-`deploymentName` (required): Name of the deployment
-`namespace` (required): Namespace of the deployment
**troubleshoot-cluster**
- General cluster troubleshooting workflow
## Usage Examples
### Example 1: Analyze a Namespace
```bash
curl -X POST http://localhost:8089/mcp \
-H "Content-Type: application/json"\
-d '{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "analyze",
"arguments": {
"namespace": "production",
"explain": "true"
}
}
}'
```
### Example 2: List Pods
```bash
curl -X POST http://localhost:8089/mcp \
-H "Content-Type: application/json"\
-d '{
"jsonrpc": "2.0",
"id": 2,
"method": "tools/call",
"params": {
"name": "list-resources",
"arguments": {
"resourceType": "pods",
"namespace": "default"
}
}
}'
```
### Example 3: Get Pod Logs
```bash
curl -X POST http://localhost:8089/mcp \
-H "Content-Type: application/json"\
-d '{
"jsonrpc": "2.0",
"id": 3,
"method": "tools/call",
"params": {
"name": "get-logs",
"arguments": {
"podName": "nginx-abc123",
"namespace": "default",
"tail": "100"
}
}
}'
```
### Example 4: Access a Resource
```bash
curl -X POST http://localhost:8089/mcp \
-H "Content-Type: application/json"\
-d '{
"jsonrpc": "2.0",
"id": 4,
"method": "resources/read",
"params": {
"uri": "namespaces"
}
}'
```
### Example 5: Get a Troubleshooting Prompt
```bash
curl -X POST http://localhost:8089/mcp \
-H "Content-Type: application/json"\
-d '{
"jsonrpc": "2.0",
"id": 5,
"method": "prompts/get",
"params": {
"name": "troubleshoot-pod",
"arguments": {
"podName": "nginx-abc123",
"namespace": "default"
}
}
}'
```
## Integration with AI Assistants
### Claude Desktop
Add to `claude_desktop_config.json`:
```json
{
"mcpServers":{
"k8sgpt":{
"command":"k8sgpt",
"args":["serve","--mcp"]
}
}
}
```
Restart Claude Desktop and you'll see k8sgpt tools available in the tool selector.
### Custom MCP Clients
Any MCP-compatible client can connect to the k8sgpt server. For HTTP-based clients:
1. Start the server: `k8sgpt serve --mcp --mcp-http --mcp-port 8089`
2. Connect to: `http://localhost:8089/mcp`
3. Use standard MCP protocol methods: `tools/list`, `tools/call`, `resources/read`, `prompts/get`

[](https://bestpractices.coreinfrastructure.org/projects/7272)
[](https://docs.k8sgpt.ai/)
<summary>Failing Installation on WSL or Linux (missing gcc)</summary>
When installing Homebrew on WSL or Linux, you may encounter the following error:
```
==> Installing k8sgpt from k8sgpt-ai/k8sgpt Error: The following formula cannot be installed from a bottle and must be
built from the source. k8sgpt Install Clang or run brew install gcc.
```
```
==> Installing k8sgpt from k8sgpt-ai/k8sgpt Error: The following formula cannot be installed from a bottle and must be
built from the source. k8sgpt Install Clang or run brew install gcc.
```
If you install gcc as suggested, the problem will persist. Therefore, you need to install the build-essential package.
```
sudo apt-get update
sudo apt-get install build-essential
```
</details>
```
sudo apt-get update
sudo apt-get install build-essential
```
</details>
### Windows
* Download the latest Windows binaries of **k8sgpt** from the [Release](https://github.com/k8sgpt-ai/k8sgpt/releases)
- Download the latest Windows binaries of **k8sgpt** from the [Release](https://github.com/k8sgpt-ai/k8sgpt/releases)
tab based on your system architecture.
* Extract the downloaded package to your desired location. Configure the system *path* variable with the binary location
- Extract the downloaded package to your desired location. Configure the system _PATH_ environment variable with the binary location
## Operator Installation
@@ -118,17 +159,86 @@ To install within a Kubernetes cluster please use our `k8sgpt-operator` with ins
_This mode of operation is ideal for continuous monitoring of your cluster and can integrate with your existing monitoring such as Prometheus and Alertmanager._
## Quick Start
* Currently the default AI provider is OpenAI, you will need to generate an API key from [OpenAI](https://openai.com)
* You can do this by running `k8sgpt generate` to open a browser link to generate it
* Run `k8sgpt auth add` to set it in k8sgpt.
* You can provide the password directly using the `--password` flag.
* Run `k8sgpt filters` to manage the active filters used by the analyzer. By default, all filters are executed during analysis.
* Run `k8sgpt analyze` to run a scan.
* And use `k8sgpt analyze --explain` to get a more detailed explanation of the issues.
* You also run `k8sgpt analyze --with-doc` (with or without the explain flag) to get the official documention from kubernetes.
- Currently, the default AI provider is OpenAI, you will need to generate an API key from [OpenAI](https://openai.com)
- You can do this by running `k8sgpt generate` to open a browser link to generate it
- Run `k8sgpt auth add` to set it in k8sgpt.
- You can provide the password directly using the `--password` flag.
- Run `k8sgpt filters` to manage the active filters used by the analyzer. By default, all filters are executed during analysis.
- Run `k8sgpt analyze` to run a scan.
- And use `k8sgpt analyze --explain` to get a more detailed explanation of the issues.
- You also run `k8sgpt analyze --with-doc` (with or without the explain flag) to get the official documentation from Kubernetes.
# Using with Claude Desktop
K8sGPT can be integrated with Claude Desktop to provide AI-powered Kubernetes cluster analysis. This integration requires K8sGPT v0.4.14 or later.
## Prerequisites
1. Install K8sGPT v0.4.14 or later:
```sh
brew install k8sgpt
```
2. Install Claude Desktop from the official website
3. Configure K8sGPT with your preferred AI backend:
```sh
k8sgpt auth
```
## Setup
1. Start the K8sGPT MCP server:
```sh
k8sgpt serve --mcp
```
2. In Claude Desktop:
- Open Settings
- Navigate to the Integrations section
- Add K8sGPT as a new integration
- The MCP server will be automatically detected
3. Configure Claude Desktop with the following JSON:
```json
{
"mcpServers": {
"k8sgpt": {
"command": "k8sgpt",
"args": [
"serve",
"--mcp"
]
}
}
}
```
## Usage
Once connected, you can use Claude Desktop to:
- Analyze your Kubernetes cluster
- Get detailed insights about cluster health
- Receive recommendations for fixing issues
- Query cluster information
Example commands in Claude Desktop:
- "Analyze my Kubernetes cluster"
- "What's the health status of my cluster?"
- "Show me any issues in the default namespace"
## Troubleshooting
If you encounter connection issues:
1. Ensure K8sGPT is running with the MCP server enabled
2. Verify your Kubernetes cluster is accessible
3. Check that your AI backend is properly configured
4. Restart both K8sGPT and Claude Desktop
For more information, visit our [documentation](https://docs.k8sgpt.ai).
## Analyzers
@@ -147,14 +257,32 @@ you will be able to write your own analyzers.
The MCP server enables integration with tools like Claude Desktop and other MCP-compatible clients. It runs on port 8089 by default and provides:
- Kubernetes cluster analysis via MCP protocol
- Resource information and health status
- AI-powered issue explanations and recommendations
For Helm chart deployment with MCP support, see the `charts/k8sgpt/values-mcp-example.yaml` file.
_Analysis with serve mode_
```
curl -X GET "http://localhost:8080/analyze?namespace=k8sgpt&explain=false"
```
</details>
## Key Features
<details>
<summary> LocalAI provider </summary>
To run local models, it is possible to use OpenAI compatible APIs, for instance [LocalAI](https://github.com/go-skynet/LocalAI) which uses [llama.cpp](https://github.com/ggerganov/llama.cpp) and [ggml](https://github.com/ggerganov/ggml) to run inference on consumer-grade hardware. Models supported by LocalAI for instance are Vicuna, Alpaca, LLaMA, Cerebras, GPT4ALL, GPT4ALL-J and koala.
To run local inference, you need to download the models first, for instance you can find `ggml` compatible models in [huggingface.com](https://huggingface.co/models?search=ggml) (for example vicuna, alpaca and koala).
### Start the API server
To start the API server, follow the instruction in [LocalAI](https://github.com/go-skynet/LocalAI#example-use-gpt4all-j-model).
### Run k8sgpt
To run k8sgpt, run `k8sgpt auth add` with the `localai` backend:
The stats mode allows for debugging and understanding the time taken by an analysis by displaying the statistics of each analyzer.
- Analyzer Ingress took 47.125583ms
- Analyzer PersistentVolumeClaim took 53.009167ms
- Analyzer CronJob took 57.517792ms
- Analyzer Deployment took 156.6205ms
- Analyzer Node took 160.109833ms
- Analyzer ReplicaSet took 245.938333ms
- Analyzer StatefulSet took 448.0455ms
- Analyzer Pod took 5.662594708s
- Analyzer Service took 38.583359166s
```
_Diagnostic information_
To collect diagnostic information use the following command to create a `dump_<timestamp>_json` in your local directory.
```
k8sgpt dump
```
</details>
<details>
<summary> AzureOpenAI provider </summary>
## LLM AI Backends
<em>Prerequisites:</em> an Azure OpenAI deployment is needed, please visit MS official [documentation](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/create-resource?pivots=web-portal#create-a-resource) to create your own.
K8sGPT uses the chosen LLM, generative AI provider when you want to explain the analysis results using --explain flag e.g. `k8sgpt analyze --explain`. You can use `--backend` flag to specify a configured provider (it's `openai` by default).
To authenticate with k8sgpt, you will need the Azure OpenAI endpoint of your tenant `"https://your Azure OpenAI Endpoint"`, the api key to access your deployment, the deployment name of your model and the model name itself.
To run k8sgpt, run `k8sgpt auth` with the `azureopenai` backend:
Lastly, enter your Azure API key, after the prompt.
Now you are ready to analyze with the azure openai backend:
```
k8sgpt analyze --explain --backend azureopenai
```
</details>
<details>
<summary>Setting a new default AI provider</summary>
There may be scenarios where you wish to have K8sGPT plugged into several default AI providers. In this case you may wish to use one as a new default, other than OpenAI which is the project default.
_To view available providers_
You can list available providers using `k8sgpt auth list`:
```
k8sgpt auth list
Default:
> openai
Active:
> openai
> azureopenai
Unused:
> openai
> localai
> ollama
> azureopenai
> cohere
> amazonbedrock
> amazonsagemaker
> google
> huggingface
> noopai
> googlevertexai
> watsonxai
> customrest
> ibmwatsonxai
```
For detailed documentation on how to configure and use each provider see [here](https://docs.k8sgpt.ai/reference/providers/backend/).
With this option, the data is anonymized before being sent to the AI Backend. During the analysis execution, `k8sgpt` retrieves sensitive data (Kubernetes object names, labels, etc.). This data is masked when sent to the AI backend and replaced by a key that can be used to de-anonymize the data when the solution is returned to the user.
<summary> Anonymization </summary>
1. Error reported during analysis:
```bash
Error: HorizontalPodAutoscaler uses StatefulSet/fake-deployment as ScaleTargetRef which does not exist.
```
2. Payload sent to the AI backend:
```bash
Error: HorizontalPodAutoscaler uses StatefulSet/tGLcCRcHa1Ce5Rs as ScaleTargetRef which does not exist.
```
3. Payload returned by the AI:
```bash
The Kubernetes system is trying to scale a StatefulSet named tGLcCRcHa1Ce5Rs using the HorizontalPodAutoscaler, but it cannot find the StatefulSet. The solution is to verify that the StatefulSet name is spelled correctly and exists in the same namespace as the HorizontalPodAutoscaler.
```
4. Payload returned to the user:
```bash
The Kubernetes system is trying to scale a StatefulSet named fake-deployment using the HorizontalPodAutoscaler, but it cannot find the StatefulSet. The solution is to verify that the StatefulSet name is spelled correctly and exists in the same namespace as the HorizontalPodAutoscaler.
```
**Anonymization does not currently apply to events.**
### Further Details
Note: **Anonymization does not currently apply to events.**
_In a few analysers like Pod, we feed to the AI backend the event messages which are not known beforehand thus we are not masking them for the **time being**._
- The following is the list of analysers in which data is **being masked**:-
- Statefulset
- Service
- PodDisruptionBudget
- Node
- NetworkPolicy
- Ingress
- HPA
- Deployment
- Cronjob
- The following is the list of analysers in which data is **not being masked**:-
- ReplicaSet
- PersistentVolumeClaim
- Pod
- Log
- **_\*Events_**
**\*Note**:
- k8gpt will not mask the above analysers because they do not send any identifying information except **Events** analyser.
- Masking for **Events** analyzer is scheduled in the near future as seen in this [issue](https://github.com/k8sgpt-ai/k8sgpt/issues/560). _Further research has to be made to understand the patterns and be able to mask the sensitive parts of an event like pod name, namespace etc._
- The following is the list of fields which are not **being masked**:-
- Describe
- ObjectStatus
- Replicas
- ContainerStatus
- **_\*Event Message_**
- ReplicaStatus
- Count (Pod)
**\*Note**:
- It is quite possible the payload of the event message might have something like "super-secret-project-pod-X crashed" which we don't currently redact _(scheduled in the near future as seen in this [issue](https://github.com/k8sgpt-ai/k8sgpt/issues/560))_.
### Proceed with care
- The K8gpt team recommends using an entirely different backend **(a local model) in critical production environments**. By using a local model, you can rest assured that everything stays within your DMZ, and nothing is leaked.
- If there is any uncertainty about the possibility of sending data to a public LLM (open AI, Azure AI) and it poses a risk to business-critical operations, then, in such cases, the use of public LLM should be avoided based on personal assessment and the jurisdiction of risks involved.
</details>
<details>
<summary> Configuration management</summary>
`k8sgpt` stores config data in the `$XDG_CONFIG_HOME/k8sgpt/k8sgpt.yaml` file. The data is stored in plain text, including your OpenAI key.
- K8sGPT will create the bucket if it does not exist
- Azure Storage
- We support a number of [techniques](https://learn.microsoft.com/en-us/azure/developer/go/azure-sdk-authentication?tabs=bash#2-authenticate-with-azure) to authenticate against Azure
- K8sGPT assumes that the storage account already exist and it will create the container if it does not exist
- It is the **user** responsibility have to grant specific permissions to their identity in order to be able to upload blob files and create SA containers (e.g Storage Blob Data Contributor)
- Google Cloud Storage
- _As a prerequisite `GOOGLE_APPLICATION_CREDENTIALS` are required as environmental variables._
- K8sGPT will create the bucket if it does not exist
_Listing cache items_
```
k8sgpt cache list
```
_Purging an object from the cache_
Note: purging an object using this command will delete upstream files, so it requires appropriate permissions.
```
k8sgpt cache purge $OBJECT_NAME
```
_Removing the remote cache_
Note: this will not delete the bucket
Note: this will not delete the upstream S3 bucket or Azure storage container
```
k8sgpt cache remove --bucket <name>
k8sgpt cache remove
```
</details>
<details>
<summary> Custom Analyzers</summary>
There may be scenarios where you wish to write your own analyzer in a language of your choice.
K8sGPT now supports the ability to do so by abiding by the [schema](https://github.com/k8sgpt-ai/schemas/blob/main/protobuf/schema/v1/custom_analyzer.proto) and serving the analyzer for consumption.
To do so, define the analyzer within the K8sGPT configuration and it will add it into the scanning process.
In addition to this you will need to enable the following flag on analysis:
```
k8sgpt analyze --custom-analysis
```
Here is an example local host analyzer in [Rust](https://github.com/k8sgpt-ai/host-analyzer)
When this is run on `localhost:8080` the K8sGPT config can pick it up with the following additions:
```
custom_analyzers:
- name: host-analyzer
connection:
url: localhost
port: 8080
```
This now gives the ability to pass through hostOS information ( from this analyzer example ) to K8sGPT to use as context with normal analysis.
K8sGPT provides a Model Context Protocol server that exposes Kubernetes operations as standardized tools for AI assistants like Claude, ChatGPT, and other MCP-compatible clients.
**Start the MCP server:**
Stdio mode (for local AI assistants):
```bash
k8sgpt serve --mcp
```
HTTP mode (for network access):
```bash
k8sgpt serve --mcp --mcp-http --mcp-port 8089
```
**Features:**
- 12 tools for cluster analysis, resource management, and debugging
- 3 resources for cluster information access
- 3 interactive troubleshooting prompts
- Stateless HTTP mode for one-off invocations
- Full integration with Claude Desktop and other MCP clients
**Learn more:** See [MCP.md](MCP.md) for complete documentation, usage examples, and integration guides.
## Documentation
Find our official documentation available [here](https://docs.k8sgpt.ai)
@@ -8,4 +8,4 @@ For example if there is a vulnerability in release `0.1.0` we will fix that rele
## Reporting a Vulnerability
If you are aware of a vulnverability please feel free to disclose it responsibly to contact@k8sgpt.ai or to one of our maintainers in our Slack community.
If you are aware of a vulnerability please feel free to disclose it responsibly to contact@k8sgpt.ai or to one of our maintainers in our Slack community.
K8sGPT supports a variety of AI/LLM providers (backends). Some providers have a fixed set of supported models, while others allow you to specify any model supported by the provider.
---
## Providers and Supported Models
### OpenAI
- **Model:** User-configurable (any model supported by OpenAI, e.g., `gpt-3.5-turbo`, `gpt-4`, etc.)
### Azure OpenAI
- **Model:** User-configurable (any model deployed in your Azure OpenAI resource)
- **Model:** User-configurable (default: `llama3`, others can be specified)
### NoOpAI
- **Model:** N/A (no real model, used for testing)
### Cohere
- **Model:** User-configurable (any model supported by Cohere)
### Amazon Bedrock
- **Supported Models:**
- anthropic.claude-sonnet-4-20250514-v1:0
- us.anthropic.claude-sonnet-4-20250514-v1:0
- eu.anthropic.claude-sonnet-4-20250514-v1:0
- apac.anthropic.claude-sonnet-4-20250514-v1:0
- us.anthropic.claude-3-7-sonnet-20250219-v1:0
- eu.anthropic.claude-3-7-sonnet-20250219-v1:0
- apac.anthropic.claude-3-7-sonnet-20250219-v1:0
- anthropic.claude-3-5-sonnet-20240620-v1:0
- us.anthropic.claude-3-5-sonnet-20241022-v2:0
- anthropic.claude-v2
- anthropic.claude-v1
- anthropic.claude-instant-v1
- ai21.j2-ultra-v1
- ai21.j2-jumbo-instruct
- amazon.titan-text-express-v1
- amazon.nova-pro-v1:0
- eu.amazon.nova-pro-v1:0
- us.amazon.nova-pro-v1:0
- amazon.nova-lite-v1:0
- eu.amazon.nova-lite-v1:0
- us.amazon.nova-lite-v1:0
- anthropic.claude-3-haiku-20240307-v1:0
> **Note:**
> If you use an AWS Bedrock inference profile ARN (e.g., `arn:aws:bedrock:us-east-1:<account>:application-inference-profile/<id>`) as the model, you must still provide a valid modelId (e.g., `anthropic.claude-3-sonnet-20240229-v1:0`). K8sGPT will automatically set the required `X-Amzn-Bedrock-Inference-Profile-ARN` header for you when making requests to Bedrock.
### Amazon SageMaker
- **Model:** User-configurable (any model deployed in your SageMaker endpoint)
### Google GenAI
- **Model:** User-configurable (any model supported by Google GenAI, e.g., `gemini-pro`)
### Huggingface
- **Model:** User-configurable (any model supported by Huggingface Inference API)
### Google VertexAI
- **Supported Models:**
- gemini-1.0-pro-001
### OCI GenAI
- **Model:** User-configurable (any model supported by OCI GenAI)
### Custom REST
- **Model:** User-configurable (any model your custom REST endpoint supports)
### IBM Watsonx
- **Supported Models:**
- ibm/granite-13b-chat-v2
### Groq
- **Model:** User-configurable (any model supported by Groq, e.g., `llama-3.3-70b-versatile`, `mixtral-8x7b-32768`)
---
For more details on configuring each provider and model, refer to the official K8sGPT documentation and the provider's own documentation.
# MCP (Model Context Protocol) server configuration
mcp:
enabled:false# Enable MCP server
port:"8089"# Port for MCP server
http:true# Enable HTTP mode for MCP server
resources:
limits:
cpu:"1"
@@ -14,7 +19,10 @@ deployment:
requests:
cpu:"0.2"
memory:"156Mi"
securityContext:{}
# Set securityContext.runAsUser/runAsGroup if necessary. Values below were taken from https://github.com/k8sgpt-ai/k8sgpt/blob/main/container/Dockerfile
AnalyzeCmd.Flags().StringVarP(&namespace,"namespace","n","","Namespace to analyze")
// no cache flag
@@ -85,7 +158,7 @@ func init() {
// explain flag
AnalyzeCmd.Flags().BoolVarP(&explain,"explain","e",false,"Explain the problem to me")
// add flag for backend
AnalyzeCmd.Flags().StringVarP(&backend,"backend","b","openai","Backend AI provider")
AnalyzeCmd.Flags().StringVarP(&backend,"backend","b","","Backend AI provider")
// output as json
AnalyzeCmd.Flags().StringVarP(&output,"output","o","text","Output format (text, json)")
// add language options for output
@@ -94,4 +167,14 @@ func init() {
AnalyzeCmd.Flags().IntVarP(&maxConcurrency,"max-concurrency","m",10,"Maximum number of concurrent requests to the Kubernetes API server")
// kubernetes doc flag
AnalyzeCmd.Flags().BoolVarP(&withDoc,"with-doc","d",false,"Give me the official documentation of the involved field")
// interactive mode flag
AnalyzeCmd.Flags().BoolVarP(&interactiveMode,"interactive","i",false,"Enable interactive mode that allows further conversation with LLM about the problem. Works only with --explain flag")
AnalyzeCmd.Flags().StringVarP(&labelSelector,"selector","L","","Label selector (label query) to filter on, supports '=', '==', and '!='. (e.g. -L key1=value1,key2=value2). Matching objects must satisfy all of the specified label constraints.")
// print stats
AnalyzeCmd.Flags().BoolVarP(&withStats,"with-stat","s",false,"Print analysis stats. This option disables errors display.")
color.Red("Error: Backend AI cannot be empty and accepted values are '%v'",strings.Join(ai.Backends,", "))
ifproviderIndex!=-1{
// provider with same name exists, update provider info
color.Yellow("Provider with same name already exists.")
os.Exit(1)
}
// check if model is not empty
ifmodel==""{
color.Red("Error: Model cannot be empty.")
model=defaultModel
color.Yellow(fmt.Sprintf("Warning: model input is empty, will use the default value: %s",defaultModel))
}
iftemperature>1.0||temperature<0.0{
color.Red("Error: temperature ranges from 0 to 1.")
os.Exit(1)
}
iftopP>1.0||topP<0.0{
color.Red("Error: topP ranges from 0 to 1.")
os.Exit(1)
}
iftopK<1||topK>100{
color.Red("Error: topK ranges from 1 to 100.")
os.Exit(1)
}
@@ -89,11 +127,20 @@ var addCmd = &cobra.Command{
// create new provider object
newProvider:=ai.AIProvider{
Name:backend,
Model:model,
Password:password,
BaseURL:baseURL,
Engine:engine,
Name:backend,
Model:model,
Password:password,
BaseURL:baseURL,
EndpointName:endpointName,
Engine:engine,
Temperature:temperature,
ProviderRegion:providerRegion,
ProviderId:providerId,
CompartmentId:compartmentId,
TopP:topP,
TopK:topK,
MaxTokens:maxTokens,
OrganizationId:organizationId,
}
ifproviderIndex==-1{
@@ -105,22 +152,37 @@ var addCmd = &cobra.Command{
os.Exit(1)
}
color.Green("%s added to the AI backend provider list",backend)
}else{
// provider with same name exists, update provider info
color.Yellow("Provider with same name already exists.")
}
},
}
funcinit(){
// add flag for backend
addCmd.Flags().StringVarP(&backend,"backend","b","openai","Backend AI provider")
addCmd.Flags().StringVarP(&backend,"backend","b",defaultBackend,"Backend AI provider")
// add flag for model
addCmd.Flags().StringVarP(&model,"model","m","gpt-3.5-turbo","Backend AI model")
addCmd.Flags().StringVarP(&model,"model","m",defaultModel,"Backend AI model")
// add flag for password
addCmd.Flags().StringVarP(&password,"password","p","","Backend AI password")
// add flag for url
addCmd.Flags().StringVarP(&baseURL,"baseurl","u","","URL AI provider, (e.g `http://localhost:8080/v1`)")
// add flag for endpointName
addCmd.Flags().StringVarP(&endpointName,"endpointname","n","","Endpoint Name, e.g. `endpoint-xxxxxxxxxxxx` (only for amazonbedrock, amazonsagemaker backends)")
// add flag for topP
addCmd.Flags().Float32VarP(&topP,"topp","",0.5,"Probability Cutoff: Set a threshold (0.0-1.0) to limit word choices. Higher values add randomness, lower values increase predictability.")
// add flag for topK
addCmd.Flags().Int32VarP(&topK,"topk","c",50,"Sampling Cutoff: Set a threshold (1-100) to restrict the sampling process to the top K most probable words at each step. Higher values lead to greater variability, lower values increases predictability.")
// max tokens
addCmd.Flags().IntVarP(&maxTokens,"maxtokens","l",2048,"Specify a maximum output length. Adjust (1-...) to control text length. Higher values produce longer output, lower values limit length")
// add flag for temperature
addCmd.Flags().Float32VarP(&temperature,"temperature","t",0.7,"The sampling temperature, value ranges between 0 ( output be more deterministic) and 1 (more random)")
// add flag for azure open ai engine/deployment name
addCmd.Flags().StringVarP(&engine,"engine","e","","Azure AI deployment name")
addCmd.Flags().StringVarP(&engine,"engine","e","","Azure AI deployment name (only for azureopenai backend)")
//add flag for amazonbedrock region name
addCmd.Flags().StringVarP(&providerRegion,"providerRegion","r","","Provider Region name (only for amazonbedrock, googlevertexai backend)")
//add flag for vertexAI/WatsonxAI Project ID
addCmd.Flags().StringVarP(&providerId,"providerId","i","","Provider specific ID for e.g. project (only for googlevertexai/ibmwatsonxai backend)")
//add flag for OCI Compartment ID
addCmd.Flags().StringVarP(&compartmentId,"compartmentId","k","","Compartment ID for generative AI model (only for oci backend)")
// add flag for openai organization
addCmd.Flags().StringVarP(&organizationId,"organizationId","o","","OpenAI or AzureOpenAI Organization ID (only for openai and azureopenai backend)")
color.Blue("Organization Id updated successfully")
}
configAI.Providers[i].Temperature=temperature
color.Green("%s updated in the AI backend provider list",backend)
}
if!foundBackend{
color.Red("Error: %s does not exist in configuration file. Please use k8sgpt auth new.",args[0])
os.Exit(1)
}
}
if!foundBackend{
color.Red("Error: %s does not exist in configuration file. Please use k8sgpt auth new.",backend)
os.Exit(1)
}
viper.Set("ai",configAI)
@@ -101,6 +111,10 @@ func init() {
updateCmd.Flags().StringVarP(&password,"password","p","","Update backend AI password")
// update flag for url
updateCmd.Flags().StringVarP(&baseURL,"baseurl","u","","Update URL AI provider, (e.g `http://localhost:8080/v1`)")
// add flag for temperature
updateCmd.Flags().Float32VarP(&temperature,"temperature","t",0.7,"The sampling temperature, value ranges between 0 ( output be more deterministic) and 1 (more random)")
// update flag for azure open ai engine/deployment name
updateCmd.Flags().StringVarP(&engine,"engine","e","","Update Azure AI deployment name")
// update flag for organizationId
updateCmd.Flags().StringVarP(&organizationId,"organizationId","o","","Update OpenAI or Azure organization Id")
@@ -56,5 +54,4 @@ var activateCmd = &cobra.Command{
funcinit(){
IntegrationCmd.AddCommand(activateCmd)
activateCmd.Flags().BoolVarP(&skipInstall,"no-install","s",false,"Only activate the integration filter without installing the filter (for example, if that filter plugin is already deployed in cluster, we do not need to re-install it again)")
// AmazonBedRockClient represents the client for interacting with the Amazon Bedrock service.
typeAmazonBedRockClientstruct{
nopCloser
clientBedrockRuntimeAPI
mgmtClientBedrockManagementAPI
model*bedrock_support.BedrockModel
temperaturefloat32
topPfloat32
maxTokensint
models[]bedrock_support.BedrockModel
}
// AmazonCompletion BedRock support region list US East (N. Virginia),US West (Oregon),Asia Pacific (Singapore),Asia Pacific (Tokyo),Europe (Frankfurt)
returnfmt.Errorf("AWS credentials are invalid or missing. Please check your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables or AWS config. Details: %v",err)
}
returnfmt.Errorf("failed to load AWS config for region %s: %w",region,err)
return"",fmt.Errorf("AWS credentials are invalid or missing. Please check your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables or AWS config. Details: %v",err)
}
return"",err
}
// Parse the response
returna.model.Response.ParseResponse(resp.Body)
}
// GetName returns the name of the AmazonBedRockClient.
output.WriteString(color.YellowString("The stats mode allows for debugging and understanding the time taken by an analysis by displaying the statistics of each analyzer.\n"))
for_,stat:=rangea.Stats{
output.WriteString(fmt.Sprintf("- Analyzer %s took %s \n",color.YellowString(stat.Analyzer),stat.DurationTime))
Text:fmt.Sprintf("OLMv1 ClusterCatalog: %s has condition of type %s, reason %s: %s",catalogName,catalogCondition.Type,catalogCondition.Reason,catalogCondition.Message),
Text:fmt.Sprintf("OLMv1 ClusterExtension: %s has condition of type %s, reason %s: %s",extensionName,extensionCondition.Type,extensionCondition.Reason,extensionCondition.Message),
failures=addExtensionFailure(failures,extension.Name,fmt.Errorf("invalid or missing extension.Spec.Source.Catalog.UpgradeConstraintPolicy (expecting 'SelfCertified' or 'CatalogProvided')"))
}
ifextension.Spec.Source.SourceType!="Catalog"{
failures=addExtensionFailure(failures,extension.Name,fmt.Errorf("invalid or missing spec.source.sourceType (expecting 'Catalog')"))
Text:fmt.Sprintf("Deployment %s/%s has %d replicas but %d are available",deployment.Namespace,deployment.Name,*deployment.Spec.Replicas,deployment.Status.Replicas),
KubernetesDoc:doc,
Sensitive:[]common.Sensitive{
{
Unmasked:deployment.Namespace,
Masked:util.MaskString(deployment.Namespace),
},
{
Unmasked:deployment.Name,
Masked:util.MaskString(deployment.Name),
},
}})
failures=append(failures,common.Failure{
Text:fmt.Sprintf("Deployment %s/%s has %d replicas in spec but %d replicas in status because status field is not updated yet after scaling and %d replicas are available with status running",deployment.Namespace,deployment.Name,*deployment.Spec.Replicas,deployment.Status.Replicas,deployment.Status.ReadyReplicas),
KubernetesDoc:doc,
Sensitive:[]common.Sensitive{
{
Unmasked:deployment.Namespace,
Masked:util.MaskString(deployment.Namespace),
},
{
Unmasked:deployment.Name,
Masked:util.MaskString(deployment.Name),
},
}})
}else{
doc:=apiDoc.GetApiDocV2("spec.replicas")
failures=append(failures,common.Failure{
Text:fmt.Sprintf("Deployment %s/%s has %d replicas but %d are available with status running",deployment.Namespace,deployment.Name,*deployment.Spec.Replicas,deployment.Status.ReadyReplicas),
// Handle the expected outcomes based on the test case
iftt.shouldFail{
iferr==nil{
t.Error("Expected an error, but got nil")
}
ifevent!=nil{
t.Errorf("Expected nil event, but got event: %s",event.Name)
}
}else{
iferr!=nil{
t.Errorf("Expected no error, but got %v",err)
}
ifevent!=nil&&event.Name!=tt.expected.Name{
t.Errorf("Expected event name %s, got %s",tt.expected.Name,event.Name)
}
}
})
}
}
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.