Add CRD analyzer documentation and examples

- Created comprehensive documentation in docs/CRD_ANALYZER.md
- Added example configuration file in examples/crd_analyzer_config.yaml
- Updated README.md to list the new customResourceAnalyzer
- Documentation includes use cases for cert-manager, ArgoCD, Kafka, Prometheus
- Added troubleshooting section and best practices

Co-authored-by: AlexsJones <1235925+AlexsJones@users.noreply.github.com>
This commit is contained in:
copilot-swe-agent[bot]
2026-02-06 07:27:34 +00:00
parent dfdcf9edd2
commit 03bd9a8387
3 changed files with 298 additions and 0 deletions

View File

@@ -279,6 +279,7 @@ you will be able to write your own analyzers.
- [x] OperatorGroup
- [x] InstallPlan
- [x] Subscription
- [x] **customResourceAnalyzer** - Generic analyzer for any CRD (cert-manager, ArgoCD, Kafka, etc.) [Documentation](docs/CRD_ANALYZER.md)
## Examples

252
docs/CRD_ANALYZER.md Normal file
View File

@@ -0,0 +1,252 @@
# Generic CRD Analyzer Configuration Examples
The Generic CRD Analyzer enables K8sGPT to automatically analyze custom resources from any installed CRD in your Kubernetes cluster. This provides observability for operator-managed resources like cert-manager, ArgoCD, Kafka, and more.
## Basic Configuration
The CRD analyzer is configured via the K8sGPT configuration file (typically `~/.config/k8sgpt/k8sgpt.yaml`). Here's a minimal example:
```yaml
crd_analyzer:
enabled: true
```
With this basic configuration, the analyzer will:
- Discover all CRDs installed in your cluster
- Apply generic health checks based on common Kubernetes patterns
- Report issues with resources that have unhealthy status conditions
## Configuration Options
### Complete Example
```yaml
crd_analyzer:
enabled: true
include:
- name: certificates.cert-manager.io
statusPath: ".status.conditions"
readyCondition:
type: "Ready"
expectedStatus: "True"
- name: applications.argoproj.io
statusPath: ".status.health.status"
expectedValue: "Healthy"
- name: kafkas.kafka.strimzi.io
readyCondition:
type: "Ready"
expectedStatus: "True"
exclude:
- name: kafkatopics.kafka.strimzi.io
- name: servicemonitors.monitoring.coreos.com
```
### Configuration Fields
#### `enabled` (boolean)
- **Default**: `false`
- **Description**: Master switch to enable/disable the CRD analyzer
- **Example**: `enabled: true`
#### `include` (array)
- **Description**: List of CRDs with custom health check configurations
- **Fields**:
- `name` (string, required): The full CRD name (e.g., `certificates.cert-manager.io`)
- `statusPath` (string, optional): JSONPath to the status field to check (e.g., `.status.health.status`)
- `readyCondition` (object, optional): Configuration for checking a Ready-style condition
- `type` (string): The condition type to check (e.g., `"Ready"`)
- `expectedStatus` (string): Expected status value (e.g., `"True"`)
- `expectedValue` (string, optional): Expected value at the statusPath (requires `statusPath`)
#### `exclude` (array)
- **Description**: List of CRDs to skip during analysis
- **Fields**:
- `name` (string): The full CRD name to exclude
## Use Cases
### 1. cert-manager Certificate Analysis
Detect certificates that are not ready or have issuance failures:
```yaml
crd_analyzer:
enabled: true
include:
- name: certificates.cert-manager.io
readyCondition:
type: "Ready"
expectedStatus: "True"
```
**Detected Issues:**
- Certificates with `Ready=False`
- Certificate renewal failures
- Invalid certificate configurations
### 2. ArgoCD Application Health
Monitor ArgoCD application sync and health status:
```yaml
crd_analyzer:
enabled: true
include:
- name: applications.argoproj.io
statusPath: ".status.health.status"
expectedValue: "Healthy"
```
**Detected Issues:**
- Applications in `Degraded` state
- Sync failures
- Missing resources
### 3. Kafka Operator Resources
Check Kafka cluster health with Strimzi operator:
```yaml
crd_analyzer:
enabled: true
include:
- name: kafkas.kafka.strimzi.io
readyCondition:
type: "Ready"
expectedStatus: "True"
exclude:
- name: kafkatopics.kafka.strimzi.io # Exclude topics to reduce noise
```
**Detected Issues:**
- Kafka clusters not ready
- Broker failures
- Configuration issues
### 4. Prometheus Operator
Monitor Prometheus instances:
```yaml
crd_analyzer:
enabled: true
include:
- name: prometheuses.monitoring.coreos.com
readyCondition:
type: "Available"
expectedStatus: "True"
```
**Detected Issues:**
- Prometheus instances not available
- Configuration reload failures
- Storage issues
## Generic Health Checks
When a CRD is not explicitly configured in the `include` list, the analyzer applies generic health checks:
### Supported Patterns
1. **status.conditions** - Standard Kubernetes conditions
- Flags `Ready` conditions with status != `"True"`
- Flags any condition type containing "failed" with status = `"True"`
2. **status.phase** - Phase-based resources
- Flags resources with phase = `"Failed"` or `"Error"`
3. **status.health.status** - ArgoCD-style health
- Flags resources with health status != `"Healthy"` (except `"Unknown"`)
4. **status.state** - State-based resources
- Flags resources with state = `"Failed"` or `"Error"`
5. **Deletion with Finalizers** - Stuck resources
- Flags resources with `deletionTimestamp` set but still having finalizers
## Running the Analyzer
### Enable in Configuration
Add the CRD analyzer to your active filters:
```bash
# Add CustomResource filter
k8sgpt filters add CustomResource
# List active filters to verify
k8sgpt filters list
```
### Run Analysis
```bash
# Basic analysis
k8sgpt analyze --explain
# With specific filter
k8sgpt analyze --explain --filter=CustomResource
# In a specific namespace
k8sgpt analyze --explain --filter=CustomResource --namespace=production
```
### Example Output
```
AI Provider: openai
0: CustomResource/Certificate(default/example-cert)
- Error: Condition Ready is False (reason: Failed): Certificate issuance failed
- Details: The certificate 'example-cert' in namespace 'default' failed to issue.
The Let's Encrypt challenge validation failed due to DNS propagation issues.
Recommendation: Check DNS records and retry certificate issuance.
1: CustomResource/Application(argocd/my-app)
- Error: Health status is Degraded
- Details: The ArgoCD application 'my-app' is in a Degraded state.
This typically indicates that deployed resources are not healthy.
Recommendation: Check application logs and pod status.
```
## Best Practices
### 1. Start with Generic Checks
Begin with just `enabled: true` to see what issues are detected across all CRDs.
### 2. Add Specific Configurations Gradually
Add custom configurations for critical CRDs that need specialized health checks.
### 3. Use Exclusions to Reduce Noise
Exclude CRDs that generate false positives or are less critical.
### 4. Combine with Other Analyzers
Use the CRD analyzer alongside built-in analyzers for comprehensive cluster observability.
### 5. Monitor Performance
If you have many CRDs, the analysis may take longer. Use exclusions to optimize.
## Troubleshooting
### Analyzer Not Running
- Verify `enabled: true` is set in configuration
- Check that `CustomResource` is in active filters: `k8sgpt filters list`
- Ensure configuration file is in the correct location
### No Issues Detected
- Verify CRDs are actually installed: `kubectl get crds`
- Check if custom resources exist: `kubectl get <crd-name> --all-namespaces`
- Review generic health check patterns - your CRDs may use different status fields
### Too Many False Positives
- Add specific configurations for problematic CRDs in the `include` section
- Use the `exclude` list to skip noisy CRDs
- Review the status patterns your CRDs use and configure accordingly
### Configuration Not Applied
- Restart K8sGPT after configuration changes
- Verify YAML syntax is correct
- Check K8sGPT logs for configuration parsing errors

View File

@@ -0,0 +1,45 @@
# Example K8sGPT Configuration with CRD Analyzer
# Place this file at ~/.config/k8sgpt/k8sgpt.yaml
# CRD Analyzer Configuration
crd_analyzer:
enabled: true
# Specific CRD configurations with custom health checks
include:
# cert-manager certificates
- name: certificates.cert-manager.io
readyCondition:
type: "Ready"
expectedStatus: "True"
# ArgoCD applications
- name: applications.argoproj.io
statusPath: ".status.health.status"
expectedValue: "Healthy"
# Strimzi Kafka clusters
- name: kafkas.kafka.strimzi.io
readyCondition:
type: "Ready"
expectedStatus: "True"
# Prometheus instances
- name: prometheuses.monitoring.coreos.com
readyCondition:
type: "Available"
expectedStatus: "True"
# CRDs to skip during analysis
exclude:
- name: kafkatopics.kafka.strimzi.io
- name: servicemonitors.monitoring.coreos.com
- name: podmonitors.monitoring.coreos.com
- name: prometheusrules.monitoring.coreos.com
# Other K8sGPT configuration...
# ai:
# providers:
# - name: openai
# model: gpt-4
# # ... other AI config