multus-cni

mirror of https://github.com/k8snetworkplumbingwg/multus-cni.git synced 2025-12-24 04:15:06 +00:00

Author	SHA1	Message	Date
“yingwang-0320”	fc3053fc6d	Bump Multus to Kube 1.34 Signed-off-by: “yingwang-0320” <yingwang@redhat.com>	2025-11-05 01:47:24 -05:00
Thomas Ferrandiz	369722ba7f	Fix formatting as required by go vet	2025-10-16 15:01:31 +00:00
Muhammad Adil Ghaffar	f18d96b648	update to go 1.23 and latest k8s version (1.32.5) Signed-off-by: Muhammad Adil Ghaffar <muhammad.adil.ghaffar@est.tech>	2025-06-24 15:11:16 +03:00
dougbtv	528d4f150c	Functionality for Aux CNI Chain using subdirectory based CNI configuration loading. Removes the it `fails to execute confListDel given no 'plugins' key"` test. This test no longer fails after libcni version 1.2.3. It probably shouldn't failduring a DEL action as it is, we want the least error prone path. The GC test now uses both cni.dev attachment formats. Uses both attachment formats as per https://github.com/containernetworking/cni/issues/1101 for GC's cni.dev/valid-attachments & cni.dev/attachments	2025-04-15 15:53:00 -04:00
dougbtv	ccfd8f5fea	When returning an empty CNI result, it must be properly structured For a previous fix of returning an empty CNI result when pods are not found, the CNI result wasn't properly structured. This fixes the structuring.	2025-03-25 14:45:04 -04:00
dougbtv	641f6a3b63	handle pod not found in CNI ADD gracefully sometimes pods get deleted super fast (like jobs or CI) and they come back as not found. instead of erroring, just return an empty CNI result so things don't blow up. adds a sentinel errPodNotFound and skips the rest of CmdAdd when we hit it. shouts to race conditions.	2025-03-24 09:58:38 -04:00
dougbtv	5892d705da	Tolerate issues writing network status annotation on CNI ADD. This change adds toleration for such errors like: ``` failed to [query/update] the pod pod-name-here in out of cluster comm: pod "pod-name-here" not found ``` During CNI ADD. While this change is a trade off in terms of debugability for RBAC, it's potentially noisy in scaled clusters when it is working properly.	2025-03-20 14:20:00 -04:00
Tomofumi Hayashi	7eb9673a1a	Call GC command with valid attachments from multus cache This code changes CNI's GC command argument. Previously it just passes from parent CNI runtime, however, it may causes unexpected resource deletion if one CNI plugin is used in both cluster network and net-attach-def. This change generates valid attachments from multus CNI cache and passed to delegate CNI plugin.	2024-12-20 11:28:41 +09:00
Tomofumi Hayashi	a439f91721	Support GC and STATUS command for cluster network This change supports up to date CNI 1.1 command, GC and STATUS for cluster network.	2024-12-20 11:28:41 +09:00
dougbtv	f186370654	adds context to GetPodAPILiveQuery	2024-12-19 14:41:32 -05:00
dougbtv	fb03b0f754	This makes sure that stale caches never result in NotFound errors. It was explained to me that informers are almost always are more efficient, and in most cases will work, but a live lookup is appropriate after a number of failures. This happens only on the retry portion, so we're still getting the benefits of informers, but, on a retry situation, we don't get a cache miss. Additionally, changes out use of cache get on this, since it already bails out before it on CNI DEL.	2024-12-19 13:57:57 -05:00
Patryk Matuszak	4ff141c18d	Don't wait too long for an answer from API Server If Multus plugin gets a DEL request, but the API Server is down (e.g. via 'crictl rmp'), the call takes so long, it actually never finishes. This prevents CRI-O from deleting the Pods.	2024-12-19 16:13:38 +01:00
Doug Smith	3c33f6f028	Merge pull request #1314 from dougbtv/client-lib-multiple-if-cni-result Updates to use CreateNetworkStauses from net-attach-def client, bump to v1.7.1	2024-08-05 07:59:36 -04:00
dougbtv	bc6c8d5c76	Updates to use CreateNetworkStauses from net-attach-def client for multiple interfaces in CNI results	2024-07-25 16:15:50 -04:00
adrianc	334fdce751	Add signals package this provides a simple way to handle incoming os signas using context Signed-off-by: adrianc <adrianc@nvidia.com>	2024-07-18 18:09:22 +03:00
Tomofumi Hayashi	d23856b784	Not exposed APIReadyCheckFunc to outside of package APIReadyCheckFunc is used only in api, hence it can be decapitalize to make its scope only in this package. This fix changes its scope. In addition, api.APIReadyCheckFunc seems to be redundant so the name is changed. Change the comment to fit to golang style, too.	2024-05-25 01:40:12 +09:00
Doug Smith	9f5c0239a8	Merge pull request #1078 from moshe010/dra add support for Dynamic Resource Allocation	2024-05-23 11:06:17 -04:00
Doug Smith	d9f1c7c6e7	Merge pull request #1243 from adrianchiris/allow_undersocre_in_ifname Change Validation of interface name	2024-05-23 09:43:12 -04:00
dougbtv	181f56f026	Thick plugin should not wait for API readiness on CNI DEL This modifies the behavior on CNI DEL for the thick plugin to just check once for API readiness, as opposed to waiting.	2024-05-14 11:23:47 -04:00
Doug Smith	c6a371b6bc	Merge pull request #1274 from s1061123/fix/gateway-nil Fix CNI cache update function to prevent nil access	2024-05-09 23:22:23 +09:00
Tomofumi Hayashi	5fe124932a	Fix CNI cache update function to prevent nil access deleteDefaultGWResult() may create 'routes:null' in CNI cache file and it causes nil pointer access at addDefaultGWCacheBytes(). This code change prevents deleteDefaultGWResult() to generate 'routes:null' in cache file.	2024-05-09 04:03:00 +09:00
Tomofumi Hayashi	541a8032c3	Fix defaultnetworkfile in unit test rename conf param, 'defaultnetworkfile' to 'readinessindicatorfile'	2024-05-02 02:30:26 +09:00
Moshe Levi	40378cabd3	add support for Dynamic Resource Allocation Signed-off-by: Moshe Levi <moshele@nvidia.com>	2024-04-11 19:16:46 +02:00
adrianc	d625d48231	Change Validation of interface name interface name should not be limited to DNS-1123 label format. instead validate interface name if provided in pod network annotation in a similar manner as iproute2[1]. this will allow to request interface names such as: "uplink_p0" [1]`11740815bf/lib/utils.c (L832)` Signed-off-by: adrianc <adrianc@nvidia.com>	2024-03-14 18:05:57 +02:00
Tomofumi Hayashi	0fd3fa7919	Fix typo	2024-03-14 23:16:06 +09:00
Ilya Maximets	ddc78f1244	server: More concise error messages. On the CNI request failure, multus-cni prints out cmdArgs. In all cases, except for debug printing, this is done with %s and a special printing function. However, the handleCNIRequest is an exception for some reason. That leads to unintelligible error messages in case of CNI request failures (severely abridged): CmdAdd (shim): CNI request failed with status 400: '&{ContainerID:<id> Netns:/var/run/netns/<uuid> IfName:eth0 Args:<args> Path: StdinData:[125 121 111 117 114 32 97 100 118 101 114 116 105 115 101 109 101 110 116 32 99 111 117 108 100 32 98 101 32 104 101 114 101 125 ... another 650 numbers ]} ContainerID:"<id>" Netns:"/var/run/netns/<uuid>" IfName:"eth0" Args:"<args>" Path:"" ERRORED: error configuring pod ... printCmdArgs() should be used for this case as well to avoid huge hardly readable logs. At the same time, the content of cniCmdArgs is always appended to the error twice as seen in the example above. The first time by the HandleCNIRequest and another time by the handleCNIRequest. Same for the HandleDelegateRequest path. Just removing the prefixing from the lower level handlers while keeping higher level ones. The 'ERRORED' part migrated to the higher level handler functions to preserve the overall look of the error. Signed-off-by: Ilya Maximets <i.maximets@ovn.org>	2024-02-29 00:38:07 +01:00
dougbtv	a1915e1a8e	Skips checking for readiness on CNI DEL (and instead warns) Because deletes should favor a successful path, the readiness check should be skipped for pod removals. This can cause an issue where there's pods pending deletes and that might impact scheduling of a pod that may be necessary in order to set the readiness indicator. Adds a new method to check for readiness indicator alone in order to immediately log a warning.	2024-02-22 09:15:11 -05:00
Doug Smith	ba18cf5ab3	Merge pull request #1214 from s1061123/add-netdef-informer Add net-attach-def informer for thick plugin	2024-02-15 09:40:57 -05:00
Tomofumi Hayashi	748930239d	Add filepath sanity check	2024-02-15 00:29:07 +09:00
Tomofumi Hayashi	a337317533	Reload bootstrap kubeconfig if cert mgr failed to load valid certs When user recreate whole cluster certs, multus thick plugin's previous cert is no longer valid. In such case, we need to prevent to use cert manager's old certs and restart it from bootstrap kubeconfig. This fix reloads client config from bootstrap kubeconfig if cert mgr's cert is failed to load pod.	2024-02-14 00:46:12 +09:00
Dennis Periquet	6c982f3fee	supplement log with stringified version of StdinData to enhance debug (#1215 )	2024-01-26 01:30:58 +09:00
Tomofumi Hayashi	6ac6fe675f	Add net-attach-def informer for thick plugin This change introduces net-attach-def informer in multus-daemon, thick pluign case. It could reduced API calls to get net-attach-def.	2024-01-20 02:04:21 +09:00
Tomofumi Hayashi	40687759fb	Reduce informer memory usage by informer transform (#1203 ) This fix reduces multus-daemon memory usage with k8s 0.29 informer transform to trim unnecessary Pod object information to multus.	2024-01-18 23:32:21 +09:00
Tomofumi Hayashi	a70da3556a	Fix a wait to account for the possiblity of a not ready unix socket (#1207 )	2024-01-11 13:34:37 +09:00
Tomofumi Hayashi	6e4f62f2f2	disable revive's dot-imports in unit test files	2024-01-05 14:32:09 +09:00
Tomofumi Hayashi	197877d113	Adds a wait to account for the possiblity of a not ready unix socket	2024-01-05 14:27:31 +09:00
Tomofumi Hayashi	540a887651	Fix to use lumberjack only for logging files	2023-12-07 21:08:17 +09:00
Tomofumi Hayashi	d97514f841	Ignore dot-imports error message only for go test files	2023-12-07 20:56:36 +09:00
Tomofumi Hayashi	46fe38e2c5	Suppress status unset in cmdDel This change stops to update status in CNI's DEL command. There are two reasons: 1. cmd DEL is invoked at only pod deletion, hence k8s does not guarantee the pod and it may be already deleted. Hence this API may failed. 2. In stateful set's pod recreation case, it may have race condition to update the status at cmd DEL case. In stateful set case, same pod name, i.e. stateful-0, is deleted and then created again. In this case, if old Pod's CNI DEL command is not finished before new Pod's creation, then SetStatus function is failed due to pod UID mismatch.	2023-10-04 23:28:26 +09:00
Tomofumi Hayashi	6a0c905347	Fix per node cert feature This change introduces certDuration as parameter to customize cert duration. In addition, environment variable for node name is matched to other usages.	2023-09-27 00:54:32 +09:00
Peng Liu	1dd4edded2	Move chroot from multus main process to its child processes (#1161 ) We used to run chroot in multus main process when calling other CNI plugin binary. We also use a mutex to lock the access to pod files. But this causes performance issues when facing heavy CNI_ADD/CNI_DEL requests. With this patch, we do chroot in the child processes instead. So file operations in the main process will not be affected by chroot. This change requires the multus thick plugin pod to mount CNI bin directory to the same path in the container host. Signed-off-by: Peng Liu <pliu@redhat.com>	2023-09-22 17:08:57 +09:00
Tomofumi Hayashi	e5d19fff6b	Add per-node-certification support This change introduces per-node certification for multus pods. Once multus pod is launched, then specified bootstrap kubeconfig is used for initial access, then multus sends CSR request to kube API to get original certs for kube API access. Once it is accepted then the multus pod uses generated certs for kube access.	2023-09-19 00:38:29 +09:00
Tomofumi Hayashi	f8afd78120	Bump golang and k8s API version	2023-09-18 01:40:44 +09:00
Dan Williams	d9c06e99d1	server: don't set CNI config readinessindicatorfile when using ConfigManager For whatever reason calling os.Stat() on the readiness indicator file from CmdAdd()/CmdDel() when multus is running in server mode and is containerized often returns "file not found", which triggers the polling behavior of GetReadinessIndicatorFile(). This greatly delays CNI operations that should be pretty quick. Even if an exponential backoff is used, os.Stat() can still return "file not found" multiple times, even though the file clearly exists. But it turns out we don't need to check the readiness file in server mode when running with MultusConfigFile == "auto". In this mode the server starts the ConfigManager which (a) waits until the file exists and (b) fsnotify watches the readiness and (c) exits the daemon immediately if the file is deleted or moved. This means we can assume that while the daemon is running and the server is handling CNI requests that the readiness file exists; otherwise the daemon would have exited. Thus CmdAdd/CmdDel don't need to run a lot of possibly failing os.Stat() calls in the CNI hot paths. Signed-off-by: Dan Williams <dcbw@redhat.com>	2023-09-14 08:58:19 -05:00
Dan Williams	b0df7dd5e3	server/config: use filepath.Join() Signed-off-by: Dan Williams <dcbw@redhat.com>	2023-09-14 08:58:19 -05:00
Dan Williams	fb4f4aa4c1	server/config: un-export some functions no longer used outside the module Signed-off-by: Dan Williams <dcbw@redhat.com>	2023-09-14 08:58:19 -05:00
Dan Williams	c2add82b93	server/config: fix MonitorPluginConfiguration test The test was comparing the same configuration to itself, since nothing in the changed CNI configuration is used in the written multus configuration. Instead make sure the updated CNI config contains something that will be reflected in the written multus configuration, and while we're there use a more robust way to wait for the config to be written via gomega.Eventually(). Signed-off-by: Dan Williams <dcbw@redhat.com>	2023-09-14 08:58:19 -05:00
Dan Williams	8539a476fd	server/config: consolidate ConfigManager start and fsnotify watching Simplify setup by moving the post-creation operations like GenerateConfig() and PersistMultusConfig() into a new Start() function that also begins watching the configuration directory. This better encapsulates the manager functionality in the object. We can also get rid of the done channel passed to the config manager and just use the existing WaitGroup to determine when to exit the daemon main(). Signed-off-by: Dan Williams <dcbw@redhat.com>	2023-09-14 08:58:19 -05:00
Dan Williams	4ade85669b	server/config: simplify ConfigManager creation A couple of the setup variables for NewManager*() are already in the multus config that it gets passed, so use those instead of passing explicitly. Signed-off-by: Dan Williams <dcbw@redhat.com>	2023-09-14 08:58:19 -05:00
Dan Williams	50c0357467	server: use a shared informer pod cache rather than direct apiserver access When running in server mode we can use a shared informer to listen for Pod events from the apiserver, and grab pod info from that cache rather than doing direct apiserver requests each time. This reduces apiserver load and retry latency, since multus can poll the local cache more frequently than it should do direct apiserver requests. Oddly static pods don't show up in the informer by the timeout and require a direct apiserver request. Since static pods are not common and are typically long-running, it should not be a big issue to fall back to direct apiserver access for them. Signed-off-by: Dan Williams <dcbw@redhat.com>	2023-09-14 08:57:12 -05:00

1 2 3 4

176 Commits