Merge pull request #1764 from mtrmac/pgzip-update

Pgzip update
This commit is contained in:
Miloslav Trmač 2022-09-30 20:36:22 +02:00 committed by GitHub
commit ff2a361a0a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
8 changed files with 1351 additions and 12 deletions

View File

@ -245,7 +245,7 @@ test-unit-local: bin/skopeo
$(GO) test $(MOD_VENDOR) -tags "$(BUILDTAGS)" $$($(GO) list $(MOD_VENDOR) -tags "$(BUILDTAGS)" -e ./... | grep -v '^github\.com/containers/skopeo/\(integration\|vendor/.*\)$$') $(GO) test $(MOD_VENDOR) -tags "$(BUILDTAGS)" $$($(GO) list $(MOD_VENDOR) -tags "$(BUILDTAGS)" -e ./... | grep -v '^github\.com/containers/skopeo/\(integration\|vendor/.*\)$$')
vendor: vendor:
$(GO) mod tidy -compat=1.17 $(GO) mod tidy
$(GO) mod vendor $(GO) mod vendor
$(GO) mod verify $(GO) mod verify

2
go.mod
View File

@ -52,7 +52,7 @@ require (
github.com/inconshreveable/mousetrap v1.0.0 // indirect github.com/inconshreveable/mousetrap v1.0.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect github.com/json-iterator/go v1.1.12 // indirect
github.com/klauspost/compress v1.15.11 // indirect github.com/klauspost/compress v1.15.11 // indirect
github.com/klauspost/pgzip v1.2.5 // indirect github.com/klauspost/pgzip v1.2.6-0.20220930104621-17e8dac29df8 // indirect
github.com/kr/pretty v0.2.1 // indirect github.com/kr/pretty v0.2.1 // indirect
github.com/kr/text v0.2.0 // indirect github.com/kr/text v0.2.0 // indirect
github.com/letsencrypt/boulder v0.0.0-20220723181115-27de4befb95e // indirect github.com/letsencrypt/boulder v0.0.0-20220723181115-27de4befb95e // indirect

1324
go.sum

File diff suppressed because it is too large Load Diff

View File

@ -1,3 +1,7 @@
arch:
- amd64
- ppc64le
language: go language: go
os: os:

View File

@ -1,4 +1,4 @@
MIT License The MIT License (MIT)
Copyright (c) 2014 Klaus Post Copyright (c) 2014 Klaus Post
@ -19,3 +19,4 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE. SOFTWARE.

View File

@ -104,13 +104,12 @@ Content is [Matt Mahoneys 10GB corpus](http://mattmahoney.net/dc/10gb.html). Com
Compressor | MB/sec | speedup | size | size overhead (lower=better) Compressor | MB/sec | speedup | size | size overhead (lower=better)
------------|----------|---------|------|--------- ------------|----------|---------|------|---------
[gzip](http://golang.org/pkg/compress/gzip) (golang) | 15.44MB/s (1 thread) | 1.0x | 4781329307 | 0% [gzip](http://golang.org/pkg/compress/gzip) (golang) | 16.91MB/s (1 thread) | 1.0x | 4781329307 | 0%
[gzip](http://github.com/klauspost/compress/gzip) (klauspost) | 135.04MB/s (1 thread) | 8.74x | 4894858258 | +2.37% [gzip](http://github.com/klauspost/compress/gzip) (klauspost) | 127.10MB/s (1 thread) | 7.52x | 4885366806 | +2.17%
[pgzip](https://github.com/klauspost/pgzip) (klauspost) | 1573.23MB/s| 101.9x | 4902285651 | +2.53% [pgzip](https://github.com/klauspost/pgzip) (klauspost) | 2085.35MB/s| 123.34x | 4886132566 | +2.19%
[bgzf](https://godoc.org/github.com/biogo/hts/bgzf) (biogo) | 361.40MB/s | 23.4x | 4869686090 | +1.85% [pargzip](https://godoc.org/github.com/golang/build/pargzip) (builder) | 334.04MB/s | 19.76x | 4786890417 | +0.12%
[pargzip](https://godoc.org/github.com/golang/build/pargzip) (builder) | 306.01MB/s | 19.8x | 4786890417 | +0.12%
pgzip also contains a [linear time compression](https://github.com/klauspost/compress#linear-time-compression-huffman-only) mode, that will allow compression at ~250MB per core per second, independent of the content. pgzip also contains a [huffman only compression](https://github.com/klauspost/compress#linear-time-compression-huffman-only) mode, that will allow compression at ~450MB per core per second, largely independent of the content.
See the [complete sheet](https://docs.google.com/spreadsheets/d/1nuNE2nPfuINCZJRMt6wFWhKpToF95I47XjSsc-1rbPQ/edit?usp=sharing) for different content types and compression settings. See the [complete sheet](https://docs.google.com/spreadsheets/d/1nuNE2nPfuINCZJRMt6wFWhKpToF95I47XjSsc-1rbPQ/edit?usp=sharing) for different content types and compression settings.
@ -123,7 +122,7 @@ In the example above, the numbers are as follows on a 4 CPU machine:
Decompressor | Time | Speedup Decompressor | Time | Speedup
-------------|------|-------- -------------|------|--------
[gzip](http://golang.org/pkg/compress/gzip) (golang) | 1m28.85s | 0% [gzip](http://golang.org/pkg/compress/gzip) (golang) | 1m28.85s | 0%
[pgzip](https://github.com/klauspost/pgzip) (golang) | 43.48s | 104% [pgzip](https://github.com/klauspost/pgzip) (klauspost) | 43.48s | 104%
But wait, since gzip decompression is inherently singlethreaded (aside from CRC calculation) how can it be more than 100% faster? Because pgzip due to its design also acts as a buffer. When using unbuffered gzip, you are also waiting for io when you are decompressing. If the gzip decoder can keep up, it will always have data ready for your reader, and you will not be waiting for input to the gzip decompressor to complete. But wait, since gzip decompression is inherently singlethreaded (aside from CRC calculation) how can it be more than 100% faster? Because pgzip due to its design also acts as a buffer. When using unbuffered gzip, you are also waiting for io when you are decompressing. If the gzip decoder can keep up, it will always have data ready for your reader, and you will not be waiting for input to the gzip decompressor to complete.

View File

@ -513,6 +513,19 @@ func (z *Reader) Read(p []byte) (n int, err error) {
func (z *Reader) WriteTo(w io.Writer) (n int64, err error) { func (z *Reader) WriteTo(w io.Writer) (n int64, err error) {
total := int64(0) total := int64(0)
avail := z.current[z.roff:]
if len(avail) != 0 {
n, err := w.Write(avail)
if n != len(avail) {
return total, io.ErrShortWrite
}
total += int64(n)
if err != nil {
return total, err
}
z.blockPool <- z.current
z.current = nil
}
for { for {
if z.err != nil { if z.err != nil {
return total, z.err return total, z.err

2
vendor/modules.txt vendored
View File

@ -287,7 +287,7 @@ github.com/klauspost/compress/internal/cpuinfo
github.com/klauspost/compress/internal/snapref github.com/klauspost/compress/internal/snapref
github.com/klauspost/compress/zstd github.com/klauspost/compress/zstd
github.com/klauspost/compress/zstd/internal/xxhash github.com/klauspost/compress/zstd/internal/xxhash
# github.com/klauspost/pgzip v1.2.5 # github.com/klauspost/pgzip v1.2.6-0.20220930104621-17e8dac29df8
## explicit ## explicit
github.com/klauspost/pgzip github.com/klauspost/pgzip
# github.com/kr/pretty v0.2.1 # github.com/kr/pretty v0.2.1