When building for 386, we got the following build error:
registry/storage/driver/s3-aws/s3.go:312:99: cannot use
maxChunkSize (untyped int constant 5368709120) as int value
in argument to getParameterAsInteger (overflows)
This is because the s3_64bit.go is used. Adjust the build tag matching
in s3_32bit.go and s3_64bit.go to fix this issue.
Signed-off-by: Chen Qi <Qi.Chen@windriver.com>
**Explanation:**
1. **Temporary File Creation:** The content is first written to a temporary file (appending `.tmp` to the original path). This ensures that the original file remains intact until the write is complete.
2. **Write to Temporary File:** Using the existing `Writer` with truncation (`false`), the content is written to the temporary file. If the write fails, the temporary file is closed and deleted.
3. **Commit and Rename:** After successfully writing to the temporary file, it is committed. Then, the temporary file is atomically renamed to the target path using `Move`, which is handled by the filesystem's rename operation (atomic on most systems).
4. **Cleanup on Failure:** If any step fails, the temporary file is cleaned up to avoid leaving orphaned files.
Signed-off-by: Oded Porat <onporat@gmail.com>
To address the issue where a failed write operation results in an empty file, we can use a temporary file for non-append writes. This ensures that the original file is only replaced once the new content is fully written and committed.
**Key Changes:**
1. **Temporary File Handling:**
- For non-append writes, a temporary file is created in the same directory as the target file.
- All write operations are performed on the temporary file first.
2. **Atomic Commit:**
- The temporary file is only renamed to the target path during `Commit()`, ensuring atomic replacement.
- If `Commit()` fails, the temporary file is cleaned up.
3. **Error Handling:**
- `Cancel()` properly removes temporary files if the operation is aborted.
- `Close()` is made idempotent to handle multiple calls safely.
4. **Data Integrity:**
- Directory sync after rename ensures metadata persistence.
- Proper file flushing and syncing before rename operations.
Signed-off-by: Oded Porat <onporat@gmail.com>
it can now return a client using default azure credentials
updated docs to include information on Azure Workload Identity
Signed-off-by: Lucas Melchior <lucasmelchior@flywheel.io>
fix anchor link in docs
Signed-off-by: Lucas Melchior <lucasmelchior@flywheel.io>
Unfortunately YAML struck us hard in this one.
It interprets "off" as a truthy value so setting loglevel to off sets it
to false.
This commit makes sure we set the loglevel to off if the param is
marshalled into false and if it's not a string.
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
* Make copy poll max retry, a global driver max retry
* Get support for etags in Azure
* Fix storage driver tests
* Fix auth mess and update docs
* Refactor Azure client and enable Azure storage tests
We use Azurite for integration testing which requires TLS,
so we had to figure out how to skip TLS verification when running tests locally:
this required updating testsuites Driver and constructor due to TestRedirectURL
sending GET and HEAD requests to remote storage which in this case is Azurite.
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
Apparently you can upload 0-size content wihtout GCS reportin any errors
back to you.
This is something a lot of our users experienced and reported. See here
for at least one example:
github.com/distribution/distribution/issues/3018
This sets tbe MD5 sum on the uploaded content which should rectify
things according to the docs:
https://pkg.go.dev/cloud.google.com/go/storage#ObjectAttrs
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
Some S3 compatible object storage systems like R2 require that all
multipart chunks are the same size. This was mostly true before, except
the final chunk was larger than the requested chunk size which causes
uploads to fail.
In addition, the two byte slices have been replaced with a single
*bytes.Buffer and the surrounding code simplified significantly.
Fixes: #3873
Signed-off-by: Thomas Way <thomas@6f.io>
when a directory is empty, the s3 api lists it with a trailing slash.
this causes the path to be appended twice to the walkInfo slice, causing
purge uploads path transformations to panic when the `_uploads` is
emtpy.
this adds a check for file paths ending on slash, and do not append
those as regular files to the walkInfo slice.
fixes#4358
Signed-off-by: Flavian Missi <fmissi@redhat.com>
The latest golangci-lint spits out some govet issues.
This commit fixes them. We are also bumping the linter version.
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
https://github.com/distribution/distribution/pull/4146 introduced a new
rewrite storage middleware but somehow missed to update the init logging
message. This commit fixes that.
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
Stat always calls ListObjects when stat-ing S3 key.
Unfortauntely ListObjects is not a free call - both in terms of egress
and actual AWS costs (likely because of the egress).
This changes the behaviour of Stat such that we always attempt the
HeadObject call first and only ever fall through to ListObjects if the
HeadObject returns an AWS API error.
Note, that the official docs mention that the only error returned by
HEAD is NoSuchKey; experiments show that this is demonstrably wrong and
the AWS docs are simply outdated at the time of this commit.
HeadObject actually returns the following errors:
* NotFound: if the queried key does not exist
* NotFound: if the queried key contains subkeys i.e. it's a prefix
* BucketRegionError: if the bucket does not exist
* Forbidden: if Head operation is not allows via IAM/ACLs
Co-authored-by: Cory Snider <corhere@gmail.com>
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
Defining an interface on the implementer side is generally not best
practice in Go code. There is no code in the distribution module which
consumes a ManifestBuilder value so there is no need to define the
interface in the distribution module. Export the concrete
ManifestBuilder types and modify the constructors to return concrete
values.
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: Cory Snider <csnider@mirantis.com>
This allows to rewrite 'URLFor' of the storage driver to use a specific
host/trim the base path.
It is different from the 'redirect' middleware, as it still calls the
storage driver URLFor.
For example, with Azure storage provider, this allows to transform the
SAS Azure Blob Storage URL into the URL compatible with Azure Front
Door.
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
The garbage-collect should remove unsed layer link file
P.S. This was originally contributed by @m-masataka, now I would like to take over it.
Thanks @m-masataka efforts with PR https://github.com/distribution/distribution/pull/2288
Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Huge help from @milosgajdos who figured out how to do the entire
marshalling/unmarshalling for the configs
Signed-off-by: Anders Ingemann <aim@orbit.online>
Enable configuration options that can selectively disable validation
that dependencies exist within the registry before the image index
is uploaded.
This enables sparse indexes, where a registry holds a manifest index that
could be signed (so the digest must not change) but does not hold every
referenced image in the index. The use case for this is when a registry
mirror does not need to mirror all platforms, but does need to maintain
the digests of all manifests either because they are signed or because
they are pulled by digest.
The registry administrator can also select specific image architectures
that must exist in the registry, enabling a registry operator to select
only the platforms they care about and ensure all image indexes uploaded
to the registry are valid for those platforms.
Signed-off-by: James Hewitt <james.hewitt@uk.ibm.com>
With the current logic we only verifies the region and return if it's
empty; we were not validating the regionEndpoint parameter.
Signed-off-by: Ankur Kothiwal <ankur.kothiwal@cern.com>
Harbor is using the distribution for it's (harbor-registry) registry component.
The harbor GC will call into the registry to delete the manifest, which in turn
then does a lookup for all tags that reference the deleted manifest.
To find the tag references, the registry will iterate every tag in the repository
and read it's link file to check if it matches the deleted manifest (i.e. to see
if uses the same sha256 digest). So, the more tags in repository, the worse the
performance will be (as there will be more s3 API calls occurring for the tag
directory lookups and tag file reads).
Therefore, we can use concurrent lookup and untag to optimize performance as described in https://github.com/goharbor/harbor/issues/12948.
P.S. This optimization was originally contributed by @Antiarchitect, now I would like to take it over.
Thanks @Antiarchitect's efforts with PR https://github.com/distribution/distribution/pull/3890.
Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
it is reasonable to ignore the error that the manifest tag path does not exist when querying
all tags of the specified repository when executing gc.
Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
Currently, the `forcepathstyle` parameter for the s3 storage driver is
considered only if the `regionendpoint` parameter is set. Since setting
a region endpoint explicitly is discouraged with AWS s3, it is not clear
how to enforce path style URLs with AWS s3.
This also means, that the default value (true) only applies if a region
endpoint is configured.
This change makes sure we always forward the `forcepathstyle` parameter
to the aws-sdk if present in the config. This is a breaking change where
a `regionendpoint` is configured but no explicit `forcepathstyle` value
is set.
Signed-off-by: Benjamin Schanzel <benjamin.schanzel@bmw.de>