Commit Graph

922 Commits

Author SHA1 Message Date
Doug Smith
ad81dbf50f
Merge pull request #1236 from igsilya/failure-logs
server: More concise error messages.
2024-03-14 10:03:51 -04:00
dependabot[bot]
7ad0dd287a
Bump google.golang.org/protobuf from 1.31.0 to 1.33.0
Bumps google.golang.org/protobuf from 1.31.0 to 1.33.0.

---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-13 23:19:51 +00:00
Tomofumi Hayashi
b09350cf1a
Merge pull request #1086 from mengzhuo/master
Delete .travis.yaml
2024-03-01 00:12:11 +09:00
Ilya Maximets
ddc78f1244 server: More concise error messages.
On the CNI request failure, multus-cni prints out cmdArgs.  In all
cases, except for debug printing, this is done with %s and a special
printing function.  However, the handleCNIRequest is an exception for
some reason.  That leads to unintelligible error messages in case
of CNI request failures (severely abridged):

 CmdAdd (shim): CNI request failed with status 400:
 '&{ContainerID:<id> Netns:/var/run/netns/<uuid> IfName:eth0
    Args:<args> Path: StdinData:[125 121 111 117 114 32 97 100 118
    101 114 116 105 115 101 109 101 110 116 32 99 111 117 108 100
    32 98 101 32 104 101 114 101 125 ... another 650 numbers ]}
 ContainerID:"<id>" Netns:"/var/run/netns/<uuid>" IfName:"eth0"
 Args:"<args>" Path:"" ERRORED: error configuring pod ...

printCmdArgs() should be used for this case as well to avoid huge
hardly readable logs.

At the same time, the content of cniCmdArgs is always appended to
the error twice as seen in the example above.  The first time by the
HandleCNIRequest and another time by the handleCNIRequest.  Same for
the HandleDelegateRequest path.

Just removing the prefixing from the lower level handlers while
keeping higher level ones.  The 'ERRORED' part migrated to the higher
level handler functions to preserve the overall look of the error.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
2024-02-29 00:38:07 +01:00
Tomofumi Hayashi
5f0b4cdc6b
Merge pull request #1235 from dougbtv/remove-readiness-check-on-del
Skips checking for readiness on CNI DEL
2024-02-22 23:55:21 +09:00
dougbtv
a1915e1a8e Skips checking for readiness on CNI DEL (and instead warns)
Because deletes should favor a successful path, the readiness check should be skipped for pod removals.

This can cause an issue where there's pods pending deletes and that might impact scheduling of a pod that may be necessary in order to set the readiness indicator.

Adds a new method  to check for readiness indicator alone in order to immediately log a warning.
2024-02-22 09:15:11 -05:00
Patryk Matuszak
53a68c35ff
Recreate configs only if base files changed (#1234) 2024-02-21 02:24:15 +09:00
Doug Smith
ca5a4c9aa9
Merge pull request #1230 from s1061123/update-kind
Update kind e2e
2024-02-15 13:25:27 -05:00
Tomofumi Hayashi
03fcb34abe Update kind e2e 2024-02-16 02:30:17 +09:00
Doug Smith
ba18cf5ab3
Merge pull request #1214 from s1061123/add-netdef-informer
Add net-attach-def informer for thick plugin
2024-02-15 09:40:57 -05:00
Doug Smith
b271fbf84d
Merge pull request #1229 from s1061123/fix/filepath
Add filepath sanity check
2024-02-14 10:48:12 -05:00
Tomofumi Hayashi
748930239d Add filepath sanity check 2024-02-15 00:29:07 +09:00
Doug Smith
c550826675
Merge pull request #1228 from s1061123/fix/reload-kubeconfig-if-failed
Reload bootstrap kubeconfig if cert mgr failed to load valid certs
2024-02-13 10:49:49 -05:00
Tomofumi Hayashi
a337317533 Reload bootstrap kubeconfig if cert mgr failed to load valid certs
When user recreate whole cluster certs, multus thick plugin's
previous cert is no longer valid. In such case, we need to prevent
to use cert manager's old certs and restart it from bootstrap
kubeconfig. This fix reloads client config from bootstrap
kubeconfig if cert mgr's cert is failed to load pod.
2024-02-14 00:46:12 +09:00
Tomofumi Hayashi
8e5060b9a7
Opt out to mount service account token (#1219) 2024-02-01 17:33:59 +09:00
Dennis Periquet
6c982f3fee
supplement log with stringified version of StdinData to enhance debug (#1215) 2024-01-26 01:30:58 +09:00
Doug Smith
1071115e90
Merge pull request #1217 from s1061123/add-sleep-thin
Add additional sleep in thick entrypoint
2024-01-25 11:10:57 -05:00
Tomofumi Hayashi
493d421cf7
Update github actions (#1216) 2024-01-26 00:46:09 +09:00
Tomofumi Hayashi
24b2d55c84 Add additional sleep in thick entrypoint 2024-01-26 00:45:47 +09:00
Tomofumi Hayashi
6812ce0ed6
Update e2e related tools (#1212) 2024-01-24 22:39:54 +09:00
Tomofumi Hayashi
6ac6fe675f Add net-attach-def informer for thick plugin
This change introduces net-attach-def informer in multus-daemon,
thick pluign case. It could reduced API calls to get
net-attach-def.
2024-01-20 02:04:21 +09:00
Fish-pro
3477c9c827
fix(quick start)-You do not need to clone the repository and directly deliver the installation file (#1210)
Signed-off-by: Zechun Chen <zechun.chen@daocloud.io>
2024-01-18 23:45:57 +09:00
Lionel Jouin
36ba3039ae
Add watch permission to thick e2e template (#1208)
As described in #1171, the watch function is required in the clusterrole
for the thick Multus version, otherwise "Failed to watch *v1.Pod" would
be returned.
2024-01-18 23:45:40 +09:00
Tomofumi Hayashi
40687759fb
Reduce informer memory usage by informer transform (#1203)
This fix reduces multus-daemon memory usage with k8s 0.29 informer
transform to trim unnecessary Pod object information to multus.
2024-01-18 23:32:21 +09:00
Tomofumi Hayashi
a70da3556a
Fix a wait to account for the possiblity of a not ready unix socket (#1207) 2024-01-11 13:34:37 +09:00
Doug Smith
003fbd5785
Merge pull request #1202 from s1061123/add-timeout
Add timeout
2024-01-05 08:04:02 -05:00
Tomofumi Hayashi
6e4f62f2f2 disable revive's dot-imports in unit test files 2024-01-05 14:32:09 +09:00
Tomofumi Hayashi
197877d113 Adds a wait to account for the possiblity of a not ready unix socket 2024-01-05 14:27:31 +09:00
Doug Smith
ab7d64e96f
Refactors the configuration options document reference (#1180) 2024-01-04 23:54:56 +09:00
dependabot[bot]
acfbd42719
Bump google.golang.org/grpc from 1.53.0 to 1.56.3 (#1182)
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.53.0 to 1.56.3.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.53.0...v1.56.3)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-12-11 14:48:54 +09:00
Doug Smith
c76db9c7a0
Merge pull request #1194 from s1061123/fix-logging
Fix to use lumberjack only for logging files
2023-12-07 09:14:30 -05:00
Tomofumi Hayashi
540a887651 Fix to use lumberjack only for logging files 2023-12-07 21:08:17 +09:00
Tomofumi Hayashi
d97514f841 Ignore dot-imports error message only for go test files 2023-12-07 20:56:36 +09:00
Moshe Levi
e4404b2645
fix e2e test ModuleNotFoundError: No module named 'pkg_resources' (#1189)
Signed-off-by: Moshe Levi <moshele@nvidia.com>
Signed-off-by: Tomofumi Hayashi <tohayash@redhat.com>
2023-12-07 20:51:02 +09:00
Jonatan
a373a2286d
Deployments: Add watch permission to thick example (#1171)
The ClusterRole was missing the watch permission on pods, which resulted in Multus throwing this error message every few seconds:

Failed to watch *v1.Pod: unknown (get pods)
2023-12-04 20:28:18 +09:00
dependabot[bot]
e2e8cfb677
Bump golang.org/x/net from 0.8.0 to 0.17.0 (#1176)
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.8.0 to 0.17.0.
- [Commits](https://github.com/golang/net/compare/v0.8.0...v0.17.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-17 14:21:43 +09:00
Doug Smith
b710020f7b
Merge pull request #1173 from s1061123/remove-status-set-del
Suppress status unset in cmdDel
2023-10-04 11:50:16 -04:00
Tomofumi Hayashi
46fe38e2c5 Suppress status unset in cmdDel
This change stops to update status in CNI's DEL command.
There are two reasons:

1. cmd DEL is invoked at only pod deletion, hence k8s does not
guarantee the pod and it may be already deleted. Hence this
API may failed.

2. In stateful set's pod recreation case, it may have race
condition to update the status at cmd DEL case.
In stateful set case, same pod name, i.e. stateful-0, is deleted
and then created again. In this case, if old Pod's CNI DEL command
is not finished before new Pod's creation, then SetStatus function
is failed due to pod UID mismatch.
2023-10-04 23:28:26 +09:00
Doug Smith
d7e391e006
Merge pull request #1167 from s1061123/fix-params
Per node certificates: Add duration parameter
2023-09-26 14:14:03 -04:00
Tomofumi Hayashi
6a0c905347 Fix per node cert feature
This change introduces certDuration as parameter to customize
cert duration. In addition, environment variable for node name
is matched to other usages.
2023-09-27 00:54:32 +09:00
Peng Liu
4d69fed8ad
Fix incorrect mount volume name in the thick plugin manifest (#1166)
Signed-off-by: Peng Liu <pliu@redhat.com>
2023-09-25 22:31:02 +09:00
Peng Liu
1dd4edded2
Move chroot from multus main process to its child processes (#1161)
We used to run chroot in multus main process when calling other CNI
plugin binary. We also use a mutex to lock the access to pod files.
But this causes performance issues when facing heavy
CNI_ADD/CNI_DEL requests.

With this patch, we do chroot in the child processes instead. So
file operations in the main process will not be affected by chroot.

This change requires the multus thick plugin pod to mount CNI bin
directory to the same path in the container host.

Signed-off-by: Peng Liu <pliu@redhat.com>
2023-09-22 17:08:57 +09:00
Doug Smith
857d070679
Merge pull request #1159 from s1061123/per-node-cert
Add per-node-certification support
2023-09-18 12:16:03 -04:00
Tomofumi Hayashi
e5d19fff6b Add per-node-certification support
This change introduces per-node certification for multus pods.
Once multus pod is launched, then specified bootstrap kubeconfig
is used for initial access, then multus sends CSR request to
kube API to get original certs for kube API access. Once it is
accepted then the multus pod uses generated certs for kube access.
2023-09-19 00:38:29 +09:00
Doug Smith
acfdc64991
Merge pull request #1158 from s1061123/bump-ver
Bump golang and k8s API version
2023-09-17 13:18:16 -04:00
Tomofumi Hayashi
f8afd78120 Bump golang and k8s API version 2023-09-18 01:40:44 +09:00
Doug Smith
ddb977f4b9
Merge pull request #1154 from dcbw/shared-informer
Performance and efficiency improvements in daemon/server mode
2023-09-15 09:56:52 -04:00
Dan Williams
d9c06e99d1 server: don't set CNI config readinessindicatorfile when using ConfigManager
For whatever reason calling os.Stat() on the readiness indicator file
from CmdAdd()/CmdDel() when multus is running in server mode and is
containerized often returns "file not found", which triggers the
polling behavior of GetReadinessIndicatorFile(). This greatly delays
CNI operations that should be pretty quick. Even if an exponential
backoff is used, os.Stat() can still return "file not found"
multiple times, even though the file clearly exists.

But it turns out we don't need to check the readiness file in server
mode when running with MultusConfigFile == "auto". In this mode the
server starts the ConfigManager which (a) waits until the file exists
and (b) fsnotify watches the readiness and (c) exits the daemon
immediately if the file is deleted or moved.

This means we can assume that while the daemon is running and the
server is handling CNI requests that the readiness file exists;
otherwise the daemon would have exited. Thus CmdAdd/CmdDel don't
need to run a lot of possibly failing os.Stat() calls in the CNI
hot paths.

Signed-off-by: Dan Williams <dcbw@redhat.com>
2023-09-14 08:58:19 -05:00
Dan Williams
b0df7dd5e3 server/config: use filepath.Join()
Signed-off-by: Dan Williams <dcbw@redhat.com>
2023-09-14 08:58:19 -05:00
Dan Williams
fb4f4aa4c1 server/config: un-export some functions no longer used outside the module
Signed-off-by: Dan Williams <dcbw@redhat.com>
2023-09-14 08:58:19 -05:00