mirror of
https://github.com/containers/skopeo.git
synced 2025-09-01 22:58:24 +00:00
vendor.conf,vendor: vndr update for containers/image
Signed-off-by: Erik Hollensbe <github@hollensbe.org>
This commit is contained in:
137
vendor/github.com/vbatts/tar-split/README.md
generated
vendored
Normal file
137
vendor/github.com/vbatts/tar-split/README.md
generated
vendored
Normal file
@@ -0,0 +1,137 @@
|
||||
# tar-split
|
||||
|
||||
[](https://travis-ci.org/vbatts/tar-split)
|
||||
|
||||
Pristinely disassembling a tar archive, and stashing needed raw bytes and offsets to reassemble a validating original archive.
|
||||
|
||||
## Docs
|
||||
|
||||
Code API for libraries provided by `tar-split`:
|
||||
|
||||
* https://godoc.org/github.com/vbatts/tar-split/tar/asm
|
||||
* https://godoc.org/github.com/vbatts/tar-split/tar/storage
|
||||
* https://godoc.org/github.com/vbatts/tar-split/archive/tar
|
||||
|
||||
## Install
|
||||
|
||||
The command line utilitiy is installable via:
|
||||
|
||||
```bash
|
||||
go get github.com/vbatts/tar-split/cmd/tar-split
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
For cli usage, see its [README.md](cmd/tar-split/README.md).
|
||||
For the library see the [docs](#docs)
|
||||
|
||||
## Demo
|
||||
|
||||
### Basic disassembly and assembly
|
||||
|
||||
This demonstrates the `tar-split` command and how to assemble a tar archive from the `tar-data.json.gz`
|
||||
|
||||
|
||||

|
||||
[youtube video of basic command demo](https://youtu.be/vh5wyjIOBtc)
|
||||
|
||||
### Docker layer preservation
|
||||
|
||||
This demonstrates the tar-split integration for docker-1.8. Providing consistent tar archives for the image layer content.
|
||||
|
||||

|
||||
[youtube vide of docker layer checksums](https://youtu.be/tV_Dia8E8xw)
|
||||
|
||||
## Caveat
|
||||
|
||||
Eventually this should detect TARs that this is not possible with.
|
||||
|
||||
For example stored sparse files that have "holes" in them, will be read as a
|
||||
contiguous file, though the archive contents may be recorded in sparse format.
|
||||
Therefore when adding the file payload to a reassembled tar, to achieve
|
||||
identical output, the file payload would need be precisely re-sparsified. This
|
||||
is not something I seek to fix imediately, but would rather have an alert that
|
||||
precise reassembly is not possible.
|
||||
(see more http://www.gnu.org/software/tar/manual/html_node/Sparse-Formats.html)
|
||||
|
||||
|
||||
Other caveat, while tar archives support having multiple file entries for the
|
||||
same path, we will not support this feature. If there are more than one entries
|
||||
with the same path, expect an err (like `ErrDuplicatePath`) or a resulting tar
|
||||
stream that does not validate your original checksum/signature.
|
||||
|
||||
## Contract
|
||||
|
||||
Do not break the API of stdlib `archive/tar` in our fork (ideally find an upstream mergeable solution).
|
||||
|
||||
## Std Version
|
||||
|
||||
The version of golang stdlib `archive/tar` is from go1.6
|
||||
It is minimally extended to expose the raw bytes of the TAR, rather than just the marshalled headers and file stream.
|
||||
|
||||
|
||||
## Design
|
||||
|
||||
See the [design](concept/DESIGN.md).
|
||||
|
||||
## Stored Metadata
|
||||
|
||||
Since the raw bytes of the headers and padding are stored, you may be wondering
|
||||
what the size implications are. The headers are at least 512 bytes per
|
||||
file (sometimes more), at least 1024 null bytes on the end, and then various
|
||||
padding. This makes for a constant linear growth in the stored metadata, with a
|
||||
naive storage implementation.
|
||||
|
||||
First we'll get an archive to work with. For repeatability, we'll make an
|
||||
archive from what you've just cloned:
|
||||
|
||||
```bash
|
||||
git archive --format=tar -o tar-split.tar HEAD .
|
||||
```
|
||||
|
||||
```bash
|
||||
$ go get github.com/vbatts/tar-split/cmd/tar-split
|
||||
$ tar-split checksize ./tar-split.tar
|
||||
inspecting "tar-split.tar" (size 210k)
|
||||
-- number of files: 50
|
||||
-- size of metadata uncompressed: 53k
|
||||
-- size of gzip compressed metadata: 3k
|
||||
```
|
||||
|
||||
So assuming you've managed the extraction of the archive yourself, for reuse of
|
||||
the file payloads from a relative path, then the only additional storage
|
||||
implications are as little as 3kb.
|
||||
|
||||
But let's look at a larger archive, with many files.
|
||||
|
||||
```bash
|
||||
$ ls -sh ./d.tar
|
||||
1.4G ./d.tar
|
||||
$ tar-split checksize ~/d.tar
|
||||
inspecting "/home/vbatts/d.tar" (size 1420749k)
|
||||
-- number of files: 38718
|
||||
-- size of metadata uncompressed: 43261k
|
||||
-- size of gzip compressed metadata: 2251k
|
||||
```
|
||||
|
||||
Here, an archive with 38,718 files has a compressed footprint of about 2mb.
|
||||
|
||||
Rolling the null bytes on the end of the archive, we will assume a
|
||||
bytes-per-file rate for the storage implications.
|
||||
|
||||
| uncompressed | compressed |
|
||||
| :----------: | :--------: |
|
||||
| ~ 1kb per/file | 0.06kb per/file |
|
||||
|
||||
|
||||
## What's Next?
|
||||
|
||||
* More implementations of storage Packer and Unpacker
|
||||
* More implementations of FileGetter and FilePutter
|
||||
* would be interesting to have an assembler stream that implements `io.Seeker`
|
||||
|
||||
|
||||
## License
|
||||
|
||||
See [LICENSE](LICENSE)
|
||||
|
44
vendor/github.com/vbatts/tar-split/tar/asm/README.md
generated
vendored
Normal file
44
vendor/github.com/vbatts/tar-split/tar/asm/README.md
generated
vendored
Normal file
@@ -0,0 +1,44 @@
|
||||
asm
|
||||
===
|
||||
|
||||
This library for assembly and disassembly of tar archives, facilitated by
|
||||
`github.com/vbatts/tar-split/tar/storage`.
|
||||
|
||||
|
||||
Concerns
|
||||
--------
|
||||
|
||||
For completely safe assembly/disassembly, there will need to be a Content
|
||||
Addressable Storage (CAS) directory, that maps to a checksum in the
|
||||
`storage.Entity` of `storage.FileType`.
|
||||
|
||||
This is due to the fact that tar archives _can_ allow multiple records for the
|
||||
same path, but the last one effectively wins. Even if the prior records had a
|
||||
different payload.
|
||||
|
||||
In this way, when assembling an archive from relative paths, if the archive has
|
||||
multiple entries for the same path, then all payloads read in from a relative
|
||||
path would be identical.
|
||||
|
||||
|
||||
Thoughts
|
||||
--------
|
||||
|
||||
Have a look-aside directory or storage. This way when a clobbering record is
|
||||
encountered from the tar stream, then the payload of the prior/existing file is
|
||||
stored to the CAS. This way the clobbering record's file payload can be
|
||||
extracted, but we'll have preserved the payload needed to reassemble a precise
|
||||
tar archive.
|
||||
|
||||
clobbered/path/to/file.[0-N]
|
||||
|
||||
*alternatively*
|
||||
|
||||
We could just _not_ support tar streams that have clobbering file paths.
|
||||
Appending records to the archive is not incredibly common, and doesn't happen
|
||||
by default for most implementations. Not supporting them wouldn't be a
|
||||
security concern either, as if it did occur, we would reassemble an archive
|
||||
that doesn't validate signature/checksum, so it shouldn't be trusted anyway.
|
||||
|
||||
Otherwise, this will allow us to defer support for appended files as a FUTURE FEATURE.
|
||||
|
Reference in New Issue
Block a user