Commit Graph

189 Commits

Author SHA1 Message Date
Ben Pickard
132c5e7e79 Merge pull request #1490 from tsorya/jkary-fix-status-gc-no-pod-context
Fix thick plugin STATUS and GC handling for plugin-level commands
2026-03-23 15:39:43 -04:00
Ben Pickard
157e72f375 Merge pull request #1478 from pliurh/kubeconfig
k8sclient: use ServerVersion instead of pod list for kubeconfig validation
2026-03-19 09:00:03 -04:00
thomasferrandiz
d801f0f407 Merge pull request #1487 from yingwang-0320/CORENET-6865-rebase
Bump multus-cni to Kube 1.35 and Go to 1.25
2026-03-19 10:12:17 +01:00
Igal Tsoiref
caedfea615 Address PR review nits from pliurh
- Check os.OpenFile error in STATUS/GC test
- Document that k8sArgs may be nil for STATUS/GC in HandleCNIRequest

Made-with: Cursor
2026-03-18 23:21:46 -04:00
Jason Kary
ec08b5fa8b Fix thick plugin STATUS and GC handling for plugin-level commands
STATUS and GC are plugin-level commands with no pod context per the
CNI 1.1.0 spec. The thick plugin daemon incorrectly required
CNI_CONTAINERID, CNI_NETNS, and K8S_POD_NAME/K8S_POD_NAMESPACE for
these commands, causing failures when invoked by kubelet.

Signed-off-by: Jason Kary <jkary@redhat.com>
2026-03-18 23:21:46 -04:00
Peng Liu
ddd00fe48b k8sclient: fix per-node kubeconfig fallback
Validate the per-node kubeconfig when a current certificate is
available and fall back to the bootstrap kubeconfig only when the
per-node config is no longer trusted.

Also rebuild the derived per-node rest.Config from the reloaded
bootstrap config so TLS settings are preserved and refreshed
consistently.

Signed-off-by: Peng Liu <pliu@redhat.com>
2026-03-17 15:06:12 +08:00
Peng Liu
f36f591be9 k8sclient: use ServerVersion instead of pod list for kubeconfig validation
Listing all pods across all namespaces during bootstrap is expensive
in large clusters and unnecessary since the result is discarded.
Use the lightweight /version endpoint to validate connectivity instead.

Signed-off-by: Peng Liu <pliu@redhat.com>
2026-03-17 14:42:53 +08:00
Yun Zhou
1f27f0e331 Sort DeviceIDs in GetPodResourceMap for deterministic ordering
When a namespace uses a primary User-Defined Network (UDN) with a
device-plugin resource (e.g. SR-IOV), OVN-Kubernetes uses the last
device in the list for the primary interface while Multus assigns
earlier devices to cluster-default/secondary interfaces. The kubelet
and checkpoint paths build the list from map iteration, so order was
non-deterministic and the "last" device could differ between callers.
Sorting ensures both Multus and OVN-K8s see the same order so the
last device is consistently the one reserved for the primary UDN.

Signed-off-by: Yun Zhou <yunz@nvidia.com>
2026-03-16 11:46:51 -07:00
Ying Wang
33e7e49a1d fix make test error: informer synchronization timeouts in multus_cni100_test.go
Signed-off-by: Ying Wang <yingwang@rehat.com>
2026-03-13 09:28:48 -04:00
Tim Rozet
56d18efde0 support GC for single-plugin delegates in CmdGC
support only existed for confList before.

Signed-off-by: Tim Rozet <trozet@nvidia.com>
2026-02-13 14:17:41 -05:00
Tim Rozet
921191dece Dynamically determine version for empty ADD result
So that we can be compatible with 1.1.0.

Signed-off-by: Tim Rozet <trozet@nvidia.com>
2026-02-13 13:55:00 -05:00
Tim Rozet
e091897b4c Update gateway-result handling for 1.1.0
Signed-off-by: Tim Rozet <trozet@nvidia.com>
2026-02-13 13:48:35 -05:00
Tim Rozet
ea389005a1 Adds support for CNI STATUS
Changes-Include:
 - Add STATUS handling for delegate requests and single‑plugin
 - Invoke STATUS for conf/conflist delegates via libcni
 - Preserve CNI error codes/messages through daemon API and shim
 - Add tests for STATUS error propagation

Signed-off-by: Tim Rozet <trozet@nvidia.com>
2026-02-13 13:32:26 -05:00
“yingwang-0320”
fc3053fc6d Bump Multus to Kube 1.34
Signed-off-by: “yingwang-0320” <yingwang@redhat.com>
2025-11-05 01:47:24 -05:00
Thomas Ferrandiz
369722ba7f Fix formatting as required by go vet 2025-10-16 15:01:31 +00:00
Muhammad Adil Ghaffar
f18d96b648 update to go 1.23 and latest k8s version (1.32.5)
Signed-off-by: Muhammad Adil Ghaffar <muhammad.adil.ghaffar@est.tech>
2025-06-24 15:11:16 +03:00
dougbtv
528d4f150c Functionality for Aux CNI Chain using subdirectory based CNI configuration loading.
Removes the it `fails to execute confListDel given no 'plugins' key"` test.

This test no longer fails after libcni version 1.2.3.
It probably shouldn't failduring a DEL action as it is, we want the least error prone path.

The GC test now uses both cni.dev attachment formats.

Uses both attachment formats as per https://github.com/containernetworking/cni/issues/1101 for GC's cni.dev/valid-attachments & cni.dev/attachments
2025-04-15 15:53:00 -04:00
dougbtv
ccfd8f5fea When returning an empty CNI result, it must be properly structured
For a previous fix of returning an empty CNI result when pods are not found, the CNI result wasn't properly structured. This fixes the structuring.
2025-03-25 14:45:04 -04:00
dougbtv
641f6a3b63 handle pod not found in CNI ADD gracefully
sometimes pods get deleted super fast (like jobs or CI) and they come back as not found.

instead of erroring, just return an empty CNI result so things don't blow up.

adds a sentinel errPodNotFound and skips the rest of CmdAdd when we hit it.

shouts to race conditions.
2025-03-24 09:58:38 -04:00
dougbtv
5892d705da Tolerate issues writing network status annotation on CNI ADD.
This change adds toleration for such errors like:

```
failed to [query/update] the pod pod-name-here in out of cluster comm: pod "pod-name-here" not found
```

During CNI ADD. While this change is a trade off in terms of debugability for RBAC, it's potentially noisy in scaled clusters when it is working properly.
2025-03-20 14:20:00 -04:00
Tomofumi Hayashi
7eb9673a1a Call GC command with valid attachments from multus cache
This code changes CNI's GC command argument. Previously it just
passes from parent CNI runtime, however, it may causes unexpected
resource deletion if one CNI plugin is used in both cluster
network and net-attach-def. This change generates valid attachments
from multus CNI cache and passed to delegate CNI plugin.
2024-12-20 11:28:41 +09:00
Tomofumi Hayashi
a439f91721 Support GC and STATUS command for cluster network
This change supports up to date CNI 1.1 command, GC and STATUS for
cluster network.
2024-12-20 11:28:41 +09:00
dougbtv
f186370654 adds context to GetPodAPILiveQuery 2024-12-19 14:41:32 -05:00
dougbtv
fb03b0f754 This makes sure that stale caches never result in NotFound errors.
It was explained to me that informers are almost always are more efficient, and in most cases will work, but a live lookup is appropriate after a number of failures.

This happens only on the retry portion, so we're still getting the benefits of informers, but, on a retry situation, we don't get a cache miss.

Additionally, changes out use of cache get on this, since it already bails out before it on CNI DEL.
2024-12-19 13:57:57 -05:00
Patryk Matuszak
4ff141c18d Don't wait too long for an answer from API Server
If Multus plugin gets a DEL request, but the API Server is down (e.g.
via 'crictl rmp'), the call takes so long, it actually never finishes.
This prevents CRI-O from deleting the Pods.
2024-12-19 16:13:38 +01:00
Doug Smith
3c33f6f028 Merge pull request #1314 from dougbtv/client-lib-multiple-if-cni-result
Updates to use CreateNetworkStauses from net-attach-def client, bump to v1.7.1
2024-08-05 07:59:36 -04:00
dougbtv
bc6c8d5c76 Updates to use CreateNetworkStauses from net-attach-def client for multiple interfaces in CNI results 2024-07-25 16:15:50 -04:00
adrianc
334fdce751 Add signals package
this provides a simple way to handle incoming
os signas using context

Signed-off-by: adrianc <adrianc@nvidia.com>
2024-07-18 18:09:22 +03:00
Tomofumi Hayashi
d23856b784 Not exposed APIReadyCheckFunc to outside of package
APIReadyCheckFunc is used only in api, hence it can be decapitalize
to make its scope only in this package. This fix changes its scope.
In addition, api.APIReadyCheckFunc seems to be redundant so the name
is changed. Change the comment to fit to golang style, too.
2024-05-25 01:40:12 +09:00
Doug Smith
9f5c0239a8 Merge pull request #1078 from moshe010/dra
add support for Dynamic Resource Allocation
2024-05-23 11:06:17 -04:00
Doug Smith
d9f1c7c6e7 Merge pull request #1243 from adrianchiris/allow_undersocre_in_ifname
Change Validation of interface name
2024-05-23 09:43:12 -04:00
dougbtv
181f56f026 Thick plugin should not wait for API readiness on CNI DEL
This modifies the behavior on CNI DEL for the thick plugin to just check once for API readiness, as opposed to waiting.
2024-05-14 11:23:47 -04:00
Doug Smith
c6a371b6bc Merge pull request #1274 from s1061123/fix/gateway-nil
Fix CNI cache update function to prevent nil access
2024-05-09 23:22:23 +09:00
Tomofumi Hayashi
5fe124932a Fix CNI cache update function to prevent nil access
deleteDefaultGWResult() may create 'routes:null' in CNI cache file
and it causes nil pointer access at addDefaultGWCacheBytes().
This code change prevents deleteDefaultGWResult() to generate
'routes:null' in cache file.
2024-05-09 04:03:00 +09:00
Tomofumi Hayashi
541a8032c3 Fix defaultnetworkfile in unit test
rename conf param, 'defaultnetworkfile' to 'readinessindicatorfile'
2024-05-02 02:30:26 +09:00
Moshe Levi
40378cabd3 add support for Dynamic Resource Allocation
Signed-off-by: Moshe Levi <moshele@nvidia.com>
2024-04-11 19:16:46 +02:00
adrianc
d625d48231 Change Validation of interface name
interface name should not be limited to DNS-1123 label format.
instead validate interface name if provided in pod network annotation
in a similar manner as iproute2[1].

this will allow to request interface names such as: "uplink_p0"

[1]11740815bf/lib/utils.c (L832)

Signed-off-by: adrianc <adrianc@nvidia.com>
2024-03-14 18:05:57 +02:00
Tomofumi Hayashi
0fd3fa7919 Fix typo 2024-03-14 23:16:06 +09:00
Ilya Maximets
ddc78f1244 server: More concise error messages.
On the CNI request failure, multus-cni prints out cmdArgs.  In all
cases, except for debug printing, this is done with %s and a special
printing function.  However, the handleCNIRequest is an exception for
some reason.  That leads to unintelligible error messages in case
of CNI request failures (severely abridged):

 CmdAdd (shim): CNI request failed with status 400:
 '&{ContainerID:<id> Netns:/var/run/netns/<uuid> IfName:eth0
    Args:<args> Path: StdinData:[125 121 111 117 114 32 97 100 118
    101 114 116 105 115 101 109 101 110 116 32 99 111 117 108 100
    32 98 101 32 104 101 114 101 125 ... another 650 numbers ]}
 ContainerID:"<id>" Netns:"/var/run/netns/<uuid>" IfName:"eth0"
 Args:"<args>" Path:"" ERRORED: error configuring pod ...

printCmdArgs() should be used for this case as well to avoid huge
hardly readable logs.

At the same time, the content of cniCmdArgs is always appended to
the error twice as seen in the example above.  The first time by the
HandleCNIRequest and another time by the handleCNIRequest.  Same for
the HandleDelegateRequest path.

Just removing the prefixing from the lower level handlers while
keeping higher level ones.  The 'ERRORED' part migrated to the higher
level handler functions to preserve the overall look of the error.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2024-02-29 00:38:07 +01:00
dougbtv
a1915e1a8e Skips checking for readiness on CNI DEL (and instead warns)
Because deletes should favor a successful path, the readiness check should be skipped for pod removals.

This can cause an issue where there's pods pending deletes and that might impact scheduling of a pod that may be necessary in order to set the readiness indicator.

Adds a new method  to check for readiness indicator alone in order to immediately log a warning.
2024-02-22 09:15:11 -05:00
Doug Smith
ba18cf5ab3 Merge pull request #1214 from s1061123/add-netdef-informer
Add net-attach-def informer for thick plugin
2024-02-15 09:40:57 -05:00
Tomofumi Hayashi
748930239d Add filepath sanity check 2024-02-15 00:29:07 +09:00
Tomofumi Hayashi
a337317533 Reload bootstrap kubeconfig if cert mgr failed to load valid certs
When user recreate whole cluster certs, multus thick plugin's
previous cert is no longer valid. In such case, we need to prevent
to use cert manager's old certs and restart it from bootstrap
kubeconfig. This fix reloads client config from bootstrap
kubeconfig if cert mgr's cert is failed to load pod.
2024-02-14 00:46:12 +09:00
Dennis Periquet
6c982f3fee supplement log with stringified version of StdinData to enhance debug (#1215) 2024-01-26 01:30:58 +09:00
Tomofumi Hayashi
6ac6fe675f Add net-attach-def informer for thick plugin
This change introduces net-attach-def informer in multus-daemon,
thick pluign case. It could reduced API calls to get
net-attach-def.
2024-01-20 02:04:21 +09:00
Tomofumi Hayashi
40687759fb Reduce informer memory usage by informer transform (#1203)
This fix reduces multus-daemon memory usage with k8s 0.29 informer
transform to trim unnecessary Pod object information to multus.
2024-01-18 23:32:21 +09:00
Tomofumi Hayashi
a70da3556a Fix a wait to account for the possiblity of a not ready unix socket (#1207) 2024-01-11 13:34:37 +09:00
Tomofumi Hayashi
6e4f62f2f2 disable revive's dot-imports in unit test files 2024-01-05 14:32:09 +09:00
Tomofumi Hayashi
197877d113 Adds a wait to account for the possiblity of a not ready unix socket 2024-01-05 14:27:31 +09:00
Tomofumi Hayashi
540a887651 Fix to use lumberjack only for logging files 2023-12-07 21:08:17 +09:00