Develop -> main (#544)

* Add support of listening to multiple netns (#418)

* multiple netns listen - initial commit

* multiple netns listen - actual work

* remove redundant log line

* map /proc of host to tapper

* changing kubernetes provider again after big conflict

* revert node-sass version back to 5.0.0

* Rename host_source to hostSource

Co-authored-by: gadotroee <55343099+gadotroee@users.noreply.github.com>

* PR fixes - adding comment + typos + naming conventions

* go fmt + making procfs read only

* setns back to the original value after packet source initialized

Co-authored-by: gadotroee <55343099+gadotroee@users.noreply.github.com>

* TRA-3842 daemon acceptance tests (#429)

* Update tap_test.go and testsUtils.go

* Update tap_test.go

* Update testsUtils.go

* Update tap_test.go and testsUtils.go

* Update tap_test.go and testsUtils.go

* Update testsUtils.go

* Update tap_test.go

* gofmt

* TRA-3913 support mizu via expose service (#440)

* Update README.md, tapRunner.go, and 4 more files...

* Update testsUtils.go

* Update proxy.go

* Update README.md, testsUtils.go, and 3 more files...

* Update testsUtils.go and provider.go

* fix readme titles (#442)

* Auto close inactive issues  (#441)

* Migrate from SQLite to Basenine and introduce a new filtering syntax (#279)

* Fix the OOMKilled error by calling `debug.FreeOSMemory` periodically

* Remove `MAX_NUMBER_OF_GOROUTINES` environment variable

* Change the line

* Increase the default value of `TCP_STREAM_CHANNEL_TIMEOUT_MS` to `10000`

* Write the client and integrate to the new real-time database

* Refactor the WebSocket implementaiton for `/ws`

* Adapt the UI to the new filtering system

* Fix the rest of the issues in the UI

* Increase the buffer of the scanner

* Implement accessing single records

* Increase the buffer of another scanner

* Populate `Request` and `Response` fields of `MizuEntry`

* Add syntax highlighting for the query

* Add database to `Dockerfile`

* Fix some issues

* Update the `realtime_dbms` Git module commit hash

* Upgrade Gin version and print the query string

* Revert "Upgrade Gin version and print the query string"

This reverts commit aa09f904ee.

* Use WebSocket's itself to query instead of the query string

* Fix some errors related to conversion to HAR

* Fix the issues caused by the latest merge

* Fix the build error

* Fix PR validation GitHub workflow

* Replace the git submodule with latest Basenine version `0.1.0`

Remove `realtime_client.go` and use the official client library `github.com/up9inc/basenine/client/go` instead.

* Move Basenine host and port constants to `shared` module

* Reliably execute and wait for Basenine to become available

* Upgrade Basenine version

* Properly close WebSocket and data channel

* Fix the issues caused by the recent merge commit

* Clean up the TypeScript code

* Update `.gitignore`

* Limit the database size

* Add `Macros` method signature to `Dissector` interface and set the macros provided by the protocol extensions

* Run `go mod tidy` on `agent`

* Upgrade `github.com/up9inc/basenine/client/go` version

* Implement a mechanism to update the query using click events in the UI and use it for protocol macros

* Update the query on click to timestamps

* Fix some issues in the WebSocket and channel handling

* Update the query on clicks to status code

* Update the query on clicks to method, path and service

* Update the query on clicks to is outgoing, source and destination ports

* Add an API endpoint to validate the query against syntax errors

* Move the query background color state into `TrafficPage`

* Fix the logic in `setQuery`

* Display a toast message in case of a syntax error in the query

* Remove a call to `fmt.Printf`

* Upgrade Basenine version to `0.1.3`

* Fix an issue related to getting `MAX_ENTRIES_DB_BYTES` environment variable

* Have the `path` key in request details, in HTTP

* Rearrange the HTTP headers for the querying

* Do the same thing for `cookies` and `queryString`

* Update the query on click to table elements

Add the selectors for `TABLE` type representations in HTTP extension.

* Update the query on click to `bodySize` and `elapsedTime` in `EntryTitle`

* Add the selectors for `TABLE` type representations in AMQP extension

* Add the selectors for `TABLE` type representations in Kafka extension

* Add the selectors for `TABLE` type representations in Redis extension

* Define a struct in `tap/api.go` for the section representation data

* Add the selectors for `BODY` type representations

* Add `request.path` to the HTTP request details

* Change the summary string's field name from `path` to `summary`

* Introduce `queryable` CSS class for queryable UI elements and underline them on hover

* Instead of `N requests` at the bottom, make it `Displaying N results (queried X/Y)` and live update the values

Upgrade Basenine version to `0.2.0`.

* Verify the sha256sum of Basenine executable inside `Dockerfile`

* Pass the start time to web UI through WebSocket and always show the `EntriesList` footer

* Pipe the `stderr` of Basenine as well

* Fix the layout issues related to `CodeEditor` in the UI

* Use the correct `shasum` command in `Dockerfile`

* Upgrade Basenine version to `0.2.1`

* Limit the height of `CodeEditor` container

* Remove `Paused` enum `ConnectionStatus` in UI

* Fix the issue caused by the recent merge

* Add the filtering guide (cheatsheet)

* Update open cheatsheet button's title

* Update cheatsheet content

* Remove the old SQLite code, adapt the `--analyze` related code to Basenine

* Change the method signature of `NewEntry`

* Change the method signature of `Represent`

* Introduce `HTTPPair` field in `MizuEntry` specific to HTTP

* Remove `Entry`, `EntryId` and `EstimatedSizeBytes` fields from `MizuEntry`

Also remove the `getEstimatedEntrySizeBytes` method.

* Remove `gorm.io/gorm` dependency

* Remove unused `sensitiveDataFiltering` folder

* Increase the left margin of open cheatsheet button

* Add `overflow: auto` to the cheatsheet `Modal`

* Fix `GetEntry` method

* Fix the macro for gRPC

* Fix an interface conversion in case of AMQP

* Fix two more interface conversion errors in AMQP

* Make the `syncEntriesImpl` method blocking

* Fix a grammar mistake in the cheatsheet

* Adapt to the changes in the recent merge commit

* Improve the cheatsheet text

* Always display the timestamp in `en-US`

* Upgrade Basenine version to `0.2.2`

* Fix the order of closing Basenine connections and channels

* Don't close the Basenine channels at all

* Upgrade Basenine version to `0.2.3`

* Set the initial filter to `rlimit(100)`

* Make Basenine persistent

* Upgrade Basenine version to `0.2.4`

* Update `debug.Dockerfile`

* Fix a failing test

* Upgrade Basenine version to `0.2.5`

* Revert "Do not show play icon when disconnected (#428)"

This reverts commit 8af2e562f8.

* Upgrade Basenine version to `0.2.6`

* Make all non-informative things informative

* Make `100` a constant

* Use `===` in JavaScript no matter what

* Remove a forgotten `console.log`

* Add a comment and update the `query` in `syncEntriesImpl`

* Don't call `panic` in `GetEntry`

* Replace `panic` calls in `startBasenineServer` with `logger.Log.Panicf`

* Remove unnecessary `\n` characters in the logs

* Remove the `Reconnect` button (#444)

* Upgrade `github.com/up9inc/basenine/client/go` version (#446)

* Fix the `Analysis` button's style into its original state (#447)

* Fix the `Analysis` button's style into its original state

* Fix the MUI button style into its original state

* Fix the acceptance tests after the merger of #279 (#443)

* Enable acceptance tests

* Fix the acceptance tests

* Move `--headless` from `getDefaultCommandArgs` to `getDefaultTapCommandArgs`

* Fix rest of the failing acceptance tests

* Revert "Enable acceptance tests"

This reverts commit 3f919e865a.

* Revert "Revert "Enable acceptance tests""

This reverts commit c0bfe54b70.

* Ignore `--headless` in `mizu view`

* Make all non-informative things informative

* Remove `github.com/stretchr/testify` dependency from the acceptance tests

* Move the helper methods `waitTimeout` and `checkDBHasEntries` from `tap_test.go` to `testsUtils.go`

* Split `checkDBHasEntries` method into `getDBEntries` and `assertEntriesAtLeast` methods

* Revert "Revert "Revert "Enable acceptance tests"""

This reverts commit c13342671c.

* Revert "Revert "Revert "Revert "Enable acceptance tests""""

This reverts commit 0f8c436926.

* Make `getDBEntries` and `checkEntriesAtLeast` methods return errors instead

* Revert "Revert "Revert "Revert "Revert "Enable acceptance tests"""""

This reverts commit 643fdde009.

* Send the message into this WebSocket connection instead of all (#449)

* Fix the CSS issues in the cheatsheet modal (#448)

* Fix the CSS issues in the cheatsheet modal

* Change the Sass variable names

* moved headless to root config, use headless in view (#450)

* extend cleanup timeout to solve context timeout problem in dump logs (#453)

* Add link to exposing mizu wiki page in README (#455)

* changed logger debug mode to log level (#456)

* fixed acceptance test go sum (#458)

* Ignore `SNYK-JS-JSONSCHEMA-1920922` (#462)

Dependency tree:
`node-sass@5.0.0 > node-gyp@7.1.2 > request@2.88.2 > http-signature@1.2.0 > jsprim@1.4.1 > json-schema@0.2.3`

`node-sass` should fix it first.

* Optimize UI entry feed performance (#452)

* Optimize the React code for feeding the entries

By building `EntryItem` only once and updating the `entries` state on meta query messages.

* Upgrade `react-scrollable-feed-virtualized` version from `1.4.3` to `1.4.8`

* Fix the `isSelected` state

* Set the query text before deciding the background to prevent lags while typing

* Upgrade Basenine version from `0.2.6` to `0.2.7`

* Set the query background color only if the query is same after the HTTP request and use `useEffect` instead

* Upgrade Basenine version from `0.2.7` to `0.2.8`

* Use `CancelToken` of `axios` instead of trying to check the query state

* Turn `updateQuery` function into a state hook

* Update the macro for `http`

* Do the `source.cancel()` call in `axios.CancelToken`

* Reduce client-side logging

* Upgrade Basenine version from `0.2.8` to `0.2.9` (#465)

Fixes `limit` helper being not finished because of lack of meta updates.

* Set `response.bodySize` to `0` if it's negative (#466)

* Prevent `elapsedTime` to be negative (#467)

Also fix the `elapsedTime` for Redis.

* changes log format to be more readable (#463)

* Stop reduction of user agent header (#468)

* remove newline in logs, fixed logs time format (#469)

* TRA-3903 better health endpoint for daemon mode (#471)

* Update main.go, status_controller.go, and 2 more files...

* Update status_controller.go and mizuTapperSyncer.go

* fixed redact acceptance test (#472)

* Return `404` instead of `500` if the entry could not be found and display a toast message (#464)

* TRA-3903 add flag to disable pvc creation for daemon mode (#474)

* Update tapRunner.go and tapConfig.go

* Update tapConfig.go

* Revert "Update tapConfig.go"

This reverts commit 5c7c02c4ab.

* TRA-3903 - display targetted pods before waiting for all daemon resources to be created (#475)

* WIP

* Update tapRunner.go

* Update tapRunner.go

* Update the UI screenshots (#476)

* Update the UI screenshots

* Update `mizu-ui.png`

* TRA-3903 fix daemon mode in permission restricted configs (#473)

* Update tapRunner.go, permissions-all-namespaces-daemon.yaml, and 2 more files...

* Update tapRunner.go

* Update tapRunner.go and permissions-ns-daemon.yaml

* Update tapRunner.go

* Update tapRunner.go

* Update tapRunner.go

* TRA-3903 minor daemon mode refactor (#479)

* Update common.go and tapRunner.go

* Update common.go

* Don't omit the key-value pair if the value is `false` in `EntryTableSection` (#478)

* Sync entries in batches just as before (using `uploadIntervalSec` parameter) (#477)

* Sync entries in batches just as before (using `uploadIntervalSec` parameter)

* Replace `lastTimeSynced` value with `time.Time{}`

Since it will be overwritten by the very first iteration.

* Clear `focusedEntryId` state in case of a filter is applied (#482)

* Prevent the crash on client-side in case of `text` being undefined in `FancyTextDisplay` (#481)

* Prevent the crash on client-side in case of `text` being undefined in `FancyTextDisplay`

* Use `String(text)` instead

* Refactor watch pods to allow reusing watch wrapper (#470)

Currently shared/kubernetes/watch.go:FilteredWatch only watches pods.
This PR makes it reusable for other types of resources.
This is done in preparation for watching k8s events.

* Show the source and destination IP in the entry feed (#485)

* Upgrade Basenine version from `0.2.9` to `0.2.10` (#484)

* Upgrade Basenine version from `0.2.9` to `0.2.10`

Fixes the issues in `limit` and `rlimit` helpers that occur when they are on the left operand of a binary expression.

* Upgrade the client hash to latest

* Remove unnecessary `tcpdump` dependency from `Dockerfile` (#491)

* Ignore gob files (#488)

* Ignore gob files

* Remove `*.db` from `.gitignore`

* Update README (#486)

* Add token validity check (#483)

* Add support to auto discover envoy processes (#459)

* discover envoy pids using cluster ips

* add istio flag to cli + rename mtls flag to istio

* add istio.md to docs

* Fixing typos

* Fix minor typos and grammer in docs

Co-authored-by: Nimrod Gilboa Markevich <nimrod@up9.com>

* Improving daemon documentation (#457)

* Some changes to the doc (#494)

* Warn pods not starting (#493)

Print warning event related to mizu k8s resources.
In non-daemon print to CLI. In Daemon print to API-Server logs.

* Remove `tap/tester/` directory (#489)

Co-authored-by: gadotroee <55343099+gadotroee@users.noreply.github.com>

* Disable IPv4 defragmentation and support IPv6 (#487)

* Remove the extra negation on `nodefrag` flag's value

* Support IPv4 fragmentation and IPv6 at the same time

* Re-enable `nodefrag` flag

* Make the `gRPC` and `HTTP/2` distinction (#492)

* Remove the extra negation on `nodefrag` flag's value

* Support IPv4 fragmentation and IPv6 at the same time

* Set `Method` and `StatusCode` fields correctly for `HTTP/2`

* Replace unnecessary `grpc` naming with `http2`

* Make the `gRPC` and `HTTP/2` distinction

* Fix the macros of `http` extension

* Fix the macros of other protocol extensions

* Update the method signature of `Represent`

* Fix the `HTTP/2` support

* Fix some minor issues

* Upgrade Basenine version from `0.2.10` to `0.2.11`

Sorts macros before expanding them and prioritize the long macros.

* Don't regex split the gRPC method name

* Re-enable `nodefrag` flag

* Remove `SetHostname` method in HTTP extension (#496)

* Remove prevPodPhase (#497)

prevPodPhase does not take into account the fact that there may be more
than one tapper pod. Therefore it is not clear what its value
represents. It is only used in a debug print. It is not worth the effort
to fix for that one debug print.

Co-authored-by: gadotroee <55343099+gadotroee@users.noreply.github.com>

* minor logging changes (#499)

Co-authored-by: gadotroee <55343099+gadotroee@users.noreply.github.com>

* Use one channel for events instead of three (#495)

Use one channel for events instead of three separate channels by event type

* Add response body to the error in case of failure (#503)

* add response body to the error in case of failure

* fix typo + make inline condition

* Remove local dev instruction from readme (#507)

* Rename `URL` field to `Target URI` in the UI to prevent confusion (#509)

* Add HTTP2 Over Cleartext (H2C) support (#510)

* Add HTTP2 Over Cleartext (H2C) support

* Remove a parameter which is a remnant of debugging

* Hide `Encoding` field if it's `undefined` or empty in the UI (#511)

* Show the `EntryItem` as `EntrySummary` in `EntryDetailed` (#506)

* Fix the selected entry behavior by propagating the `focusedEntryId` through WebSocket (before #452) TRA-3983 (#513)

* Revert the select entry behavior into its original state RACING! (before #452) [TRA-3983 alternative 3]

* Remove the remaining `forceSelect`(s)

* Add a missing `focusedEntryId` prop

* Fix the race condition

* Propagate the `focusedEntryId` through WebSocket to prevent racing

* Handle unexpected socket close and replace the default `rlimit(100)` filter with `leftOff(-1)` filter (#508)

* Handle unexpected socket close and replace the default `rlimit(100)` filter with `leftOff(-1)` filter

* Rename `dontClear` parameter to `resetEntriesBuffer` and remove negation

* Add `Queryable` component to show a green add circle icon for the queryable UI elements (#512)

* Add `Queryable` component to show a green circle and use it in `EntryViewLine`

* Refactor `Queryable` component

* Use the `Queryable` component `EntryDetailed`

* Use the `Queryable` component `Summary`

* Instead of passing the style to `Queryable`, pass the children components directly

* Make `useTooltip = true` by default in `Queryable`

* Refactor a lot of styling to achieve using `Queryable` in `Protocol` component

* Migrate the last queryable elements in `EntryListItem` to `Queryable` component

* Fix some of the styling issues

* Make horizontal `Protocol` `Queryable` too

* Remove unnecessary child constants

* Revert some of the changes in 2a93f365f5

* Fix rest of the styling issues

* Fix one more styling issue

* Update the screenshots and text in the cheatsheet according to the change

* Use `let` not `var`

* Add missing dependencies to the React hook

* Bring back `GetEntries` HTTP endpoint (#515)

* Bring back `GetEntries` HTTP endpoint

* Upgrade Basenine version from `0.2.12` to `0.2.13`

* Accept negative `leftOff` value

* Remove `max`es from the validations

* Make `timeoutMs` optional

* Update the route comment

* Add `EntriesResponse` struct

* Disable telemetry by env var MIZU_DISABLE_TELEMTRY (#517)

* Replace `privileged` with specific CAPABILITIES requests  (#514)

* Fix the styling of `Queryable` under `StatusCode` and `Summary` components (#519)

* Fix the CSS issue in `Queryable` inside `EntryViewLine` (#521)

* TRA-4017 Bring back `getOldEntries` method using fetch API and always start streaming from now (#518)

* Bring back `getOldEntries` method using fetch API

* Determine no more data on top based on `leftOff` value

* Remove `entriesBuffer` state

* Always open WebSocket with some `leftOff` value

* Rename `leftOff` state to `leftOffBottom`

* Don't set the `focusedEntryId` through WebSocket if the WebSocket is closed

* Call `setQueriedCurrent` with addition

* Close WebSocket upon reaching to top

* Open WebSocket upon snapping to bottom

* Close the WebSocket on snap broken event instead

* Set queried current value to zero upon filter submit

* Upgrade `react-scrollable-feed-virtualized` version and use `scrollToIndex` function

* Change the footer text format

* Improve no more data top logic

* Fix `closeWebSocket()` call logic in `onSnapBrokenEvent` and handle `data.meta` being `null` in `getOldEntries`

* Fix the issues around fetching old records

* Clean up `EntriesList.module.sass`

* Decrement initial `leftOffTop` value by `2`

* Fix the order of `incomingEntries` in `getOldEntries`

* Request `leftOffTop - 1` from `fetchEntries`

* Limit the front-end total entries fetched through WebSocket count to `10000`

* Lose the UI performance gain that's provided by #452

* Revert "Fix the selected entry behavior by propagating the `focusedEntryId` through WebSocket (before #452) TRA-3983 (#513)"

This reverts commit 873f252544.

* Fix the issues caused by 09371f141f

* Upgrade Basenine version from `0.2.13` to `0.2.14`

* Upgrade Basenine version from `0.2.14` to `0.2.15`

* Fix the condition of "Fetch old records" button visibility

* Upgrade Basenine version from `0.2.15` to `0.2.16` and fix the UI code related to fetching old records

* Make `newEntries` constant

* Add type switch for `Base` field of `MizuEntry` (#520)

* Disable version check for devs (#522)

* Report the platform in telemtry (#523)

Co-authored-by: Igor Gov <igor.govorov1@gmail.com>

* Include milliseconds information into the timestamps in the UI (#524)

* Include milliseconds information into the timestamps in the UI

* Upgrade Basenine version from `0.2.16` to `0.2.17`

* Increase the `width` of timestamp

* Fix the CSS issues in queryable vertical protocol element (#526)

* Remove unnecessary fields and split `service` into `src.name` and `dst.name` (#525)

* Remove unnecessary fields and split `service` into `src.name` and `dst.name`

* Don't fall back to IP address but instead display `[Unresolved]` text

* Fix the CSS issues in the plus icon position and replace the separator `->` text with `SwapHorizIcon`

* make description of mizu config options public (#527)

* Fix the glitch (#529)

* Fix the glitch

* Bring back the functionality to "Fetch old records" and "Snap to bottom" buttons

* Fix the CSS issue in `Queryable` component for `src.name` field on heading mode (#530)

* API server stores tappers status (#531)

* Decreased API server boot time (#536)

* Change the connection status text and the toggle connection behavior (#534)

* Update the "Started listening at" timestamp and `queriedTotal` state based on database truncation (#533)

* Send pod info to tapper (#532)

* Alert on acceptance tests failure (#537)

* Fix health tapper status count (#538)

* Fix: acceptance tests (#539)

* Fix a JavaScript error in case of `null` attribute and an interface conversion error in the API server (#540)

* Bringing back the pod watch api server events to make acceptance test more stable (#541)

* TRA-4060 fix proxying error (#542)

* TRA-4062 remove duplicate target pod print (#543)

* Report pods "isTapped" to FE (#535)

* Fix acceptance tests (after pods status request change) (#545)

Co-authored-by: David Levanon <dvdlevanon@gmail.com>
Co-authored-by: gadotroee <55343099+gadotroee@users.noreply.github.com>
Co-authored-by: RamiBerm <54766858+RamiBerm@users.noreply.github.com>
Co-authored-by: M. Mert Yıldıran <mehmet@up9.com>
Co-authored-by: RoyUP9 <87927115+RoyUP9@users.noreply.github.com>
Co-authored-by: Nimrod Gilboa Markevich <59927337+nimrod-up9@users.noreply.github.com>
Co-authored-by: Nimrod Gilboa Markevich <nimrod@up9.com>
Co-authored-by: Alon Girmonsky <1990761+alongir@users.noreply.github.com>
Co-authored-by: Igor Gov <igor.govorov1@gmail.com>
Co-authored-by: Alex Haiut <alex@up9.com>
This commit is contained in:
Igor Gov
2021-12-19 15:28:01 +02:00
committed by GitHub
parent 4badaadcc1
commit 72f4753620
149 changed files with 6806 additions and 3516 deletions

91
docs/CONFIGURATION.md Normal file
View File

@@ -0,0 +1,91 @@
![Mizu: The API Traffic Viewer for Kubernetes](../assets/mizu-logo.svg)
# Configuration options for Mizu
Mizu has many configuration options and flags that affect its behavior. Their values can be modified via command-line interface or via configuration file.
The list below covers most useful configuration options.
### Config file
Mizu behaviour can be modified via YAML configuration file located at `$HOME/.mizu/config.yaml`.
Default values for the file can be viewed via `mizu config` command.
### Applying config options via command line
To apply any configuration option via command line, use `--set` following by config option name and value, like in the following example:
```
mizu tap --set tap.dry-run=true
```
Please make sure to use full option name (`tap.dry-run` as opposed to `dry-run` only), incl. section (`tap`, in this example)
## General section
* `agent-image` - full path to Mizu container image, in format `full.path.to/your/image:tag`. Default value is set at compilation time to `gcr.io/up9-docker-hub/mizu/<branch>:<version>`
* `dump-logs` - if set to `true`, saves log files for all Mizu components (tapper, api-server, CLI) in a zip file under `$HOME/.mizu`. Default value is `false`
* `image-pull-policy` - container image pull policy for Kubernetes, default value `Always`. Other accepted values are `Never` or `IfNotExist`. Please mind the implications when changing this.
* `kube-config-path` - path to alternative kubeconfig file to use for all interactions with Kubernetes cluster. By default - `$HOME/.kubeconfig`
* `mizu-resources-namespace` - Kubernetes namespace where all Mizu-related resources are created. Default value `mizu`
* `telemetry` - report anonymous usage statistics. Default value `true`
## section `tap`
* `namespaces` - list of namespace names, in which pods are tapped. Default value is empty, meaning only pods in the current namespace are tapped. Typically supplied as command line options.
* `all-namespaces` - special flag indicating whether Mizu should search and tap pods, matching the regex, in all namespaces. Default is `false`. Please use with caution, tapping too many pods can affect resource consumption.
* `daemon` - instructs Mizu whether to run daemon mode (where CLI command exits after launch, and tapper & api-server pods in Kubernetes continue to run without controlling CLI). Typically supplied as command-line option `--daemon`. Default valie is `false`
* `dry-run` - if true, Mizu will print list of pods matching the supplied (or default) regex and exit without actually tapping the traffic. Default value is `false`. Typically supplied as command-line option `--dry-run`
* `proxy-host` - IP address on which proxy to Mizu API service is launched; should be accessible at `proxy-host:gui-port`. Default value is `127.0.0.1`
* `gui-port` - port on which Mizu GUI is accessible, default value is `8899` (stands for `8899/tcp`)
* `regex` - regular expression used to match pods to tap, when no regex is given in the command line; default value is `.*`, which means `mizu tap` with no additional arguments is runnining as `mizu tap .*` (i.e. tap all pods found in current workspace)
* `no-redact` - instructs Mizu whether to redact certain sensitive fields in the collected traffic. Default value is `false`, i.e. Mizu will replace sentitive data values with *REDACTED* placeholder.
* `ignored-user-agents` - array of strings, describing HTTP *User-Agent* header values to be ignored. Useful to ignore Kubernetes healthcheck and other similar noisy periodic probes. Default value is empty.
* `max-entries-db-size` - maximal size of traffic stored locally in the `mizu-api-server` pod. When this size is reached, older traffic is overwritten with new entries. Default value is `200MB`
### section `tap.api-server-resources`
Kubernetes request and limit values for the `mizu-api-server` pod.
Parameters and their default values are same as used natively in Kubernetes pods:
```
cpu-limit: 750m
memory-limit: 1Gi
cpu-requests: 50m
memory-requests: 50Mi
```
### section `tap.tapper-resources`
Kubernetes request and limit values for the `mizu-tapper` pods (launched via daemonset).
Parameters and their default values are same as used natively in Kubernetes pods:
```
cpu-limit: 750m
memory-limit: 1Gi
cpu-requests: 50m
memory-requests: 50Mi
```
--
* `analsys` - enables advanced analysis of collected traffic in the UP9 coud platform. Default value is `false`
* `upload-interval` - in the *analysis* mode, push traffic to UP9 cloud every `upload-interval` seconds. Default value is `10` seconds
* `ask-upload-confirmation` - request user confirmation when uploading tapped data to UP9 cloud
## section `version`
* `debug`- print additional version and build information when `mizu version` command is invoked. Default value is `false`.

80
docs/DAEMON_MODE.md Normal file
View File

@@ -0,0 +1,80 @@
# Mizu daemon mode
Mizu can be run detached from the cli using the daemon flag: `mizu tap --daemon`. This type of mizu instance will run
indefinitely in the cluster.
Please note that daemon mode requires you to have RBAC creation permissions, see the [permissions](PERMISSIONS.md)
doc for more details.
```bash
$ mizu tap "^ca.*" --daemon
Mizu will store up to 200MB of traffic, old traffic will be cleared once the limit is reached.
Tapping pods in namespaces "sock-shop"
Waiting for mizu to be ready... (may take a few minutes)
+carts-66c77f5fbb-fq65r
+catalogue-5f4cb7cf5-7zrmn
..
```
## Stop mizu daemon
To stop the detached mizu instance and clean all cluster side resources, run `mizu clean`
```bash
$ mizu clean # mizu will continue running in cluster until clean is executed
Removing mizu resources
```
## Expose mizu web app
Mizu could be exposed at a later stage in any of the following ways:
### Using mizu view command
In a machine that can access both the cluster and a browser, you can run `mizu view` command which creates a proxy.
Besides that, all the regular ways to expose k8s pods are valid.
```bash
$ mizu view
Establishing connection to k8s cluster...
Mizu is available at http://localhost:8899
^C
..
```
### Port Forward
```bash
$ kubectl port-forward -n mizu deployment/mizu-api-server 8899:8899
```
### NodePort
```bash
$ kubectl expose -n mizu deployment mizu-api-server --name mizu-node-port --type NodePort --port 80 --target-port 8899
```
Mizu's IP is the IP of any node (get the IP with `kubectl get nodes -o wide`) and the port is the target port of the new
service (`kubectl get services -n mizu mizu-node-port`). Note that this method will expose Mizu to public access if your
nodes are public.
### LoadBalancer
```bash
$ kubectl expose deployment -n mizu --port 80 --target-port 8899 mizu-api-server --type=LoadBalancer --name=mizu-lb
service/mizu-lb exposed
..
$ kubectl get services -n mizu
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
mizu-api-server ClusterIP 10.107.200.100 <none> 80/TCP 5m5s
mizu-lb LoadBalancer 10.107.200.101 34.77.120.116 80:30141/TCP 76s
```
Note that `LoadBalancer` services only work on supported clusters (usually cloud providers) and might incur extra costs
If you changed the `mizu-resources-namespace` value, make sure the `-n mizu` flag of the `kubectl expose` command is
changed to the value of `mizu-resources-namespace`
mizu will now be available both by running `mizu view` or by accessing the `EXTERNAL-IP` of the `mizu-lb` service
through your browser.

46
docs/ISTIO.md Normal file
View File

@@ -0,0 +1,46 @@
![Mizu: The API Traffic Viewer for Kubernetes](../assets/mizu-logo.svg)
# Istio mutual tls (mtls) with Mizu
This document describe how Mizu tapper handles workloads configured with mtls, making the internal traffic between services in a cluster to be encrypted.
Besides Istio there are other service meshes that implement mtls. However, as of now Istio is the most used one, and this is why we are focusing on it.
In order to create an Istio setup for development, follow those steps:
1. Deploy a sample application to a Kubernetes cluster, the sample application needs to make internal service to service calls
2. SSH to one of the nodes, and run `tcpdump`
3. Make sure you see the internal service to service calls in a plain text
4. Deploy Istio to the cluster - make sure it is attached to all pods of the sample application, and that it is configured with mtls (default)
5. Run `tcpdump` again, make sure you don't see the internal service to service calls in a plain text
## The connection between Istio and Envoy
In order to implement its service mesh capabilities, [Istio](https://istio.io) use an [Envoy](https://www.envoyproxy.io) sidecar in front of every pod in the cluster. The Envoy is responsible for the mtls communication, and that's why we are focusing on Envoy proxy.
In the future we might see more players in that field, then we'll have to either add support for each of them or go with a unified eBPF solution.
## Network namespaces
A [linux network namespace](https://man7.org/linux/man-pages/man7/network_namespaces.7.html) is an isolation that limit the process view of the network. In the container world it used to isolate one container from another. In the Kubernetes world it used to isolate a pod from another. That means that two containers running on the same pod share the same network namespace. A container can reach a container in the same pod by accessing `localhost`.
An Envoy proxy configured with mtls receives the inbound traffic directed to the pod, decrypts it and sends it via `localhost` to the target container.
## Tapping mtls traffic
In order for Mizu to be able to see the decrypted traffic it needs to listen on the same network namespace of the target pod. Multiple threads of the same process can have different network namespaces.
[gopacket](https://github.com/google/gopacket) uses [libpacp](https://github.com/the-tcpdump-group/libpcap) by default for capturing the traffic. Libpacap doesn't support network namespaces and we can't ask it to listen to traffic on a different namespace. However, we can change the network namespace of the calling thread and then start libpcap to see the traffic on a different namespace.
## Finding the network namespace of a running process
The network namespace of a running process can be found in `/proc/PID/ns/net` link. Once we have this link, we can ask Linux to change the network namespace of a thread to this one.
This mean that Mizu needs to have access to the `/proc` (procfs) of the running node.
## Finding the network namespace of a running pod
In order for Mizu to be able to listen to mtls traffic, it needs to get the PIDs of the the running pods, filter them according to the user filters and then start listen to their internal network namespace traffic.
There is no official way in Kubernetes to get from pod to PID. The CRI implementation purposefully doesn't force a pod to be a processes on the host. It can be a Virtual Machine as well like [Kata containers](https://katacontainers.io)
While we can provide a solution for various CRIs (like Docker, Containerd and CRI-O) it's better to have a unified solution. In order to achieve that, Mizu scans all the processes in the host, and finds the Envoy processes using their `/proc/PID/exe` link.
Once Mizu detects an Envoy process, it need to check whether this specific Envoy process is relevant according the user filters. The user filters are a list of `CLUSTER_IPS`. The tapper gets them via the `TapOpts.FilterAuthorities` list.
Istio sends an `INSTANCE_IP` environment variable to every Envoy proxy process. By examining the Envoy process's environment variables we can see whether it's relevant or not. Examining a process environment variables is done by reading the `/proc/PID/envion` file.
## Edge cases
The method we use to find Envoy processes and correlate them to the cluster IPs may be inaccurate in certain situations. If, for example, a user runs an Envoy process manually, and set its `INSTANCE_IP` environment variable to one of the `CLUSTER_IPS` the tapper gets, then Mizu will capture traffic for it.