- **Description:** This PR adds functionality to pass in in-memory bytes
as a source to `AzureAIDocumentIntelligenceLoader`.
- **Issue:** I needed the functionality, so I added it.
- **Dependencies:** NA
- **Twitter handle:** @akseljoonas if this is a big enough change :)
---------
Co-authored-by: Aksel Joonas Reedi <aksel@klippa.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
# OCR-based PDF loader
This implements [Zerox](https://github.com/getomni-ai/zerox) PDF
document loader.
Zerox utilizes simple but very powerful (even though slower and more
costly) approach to parsing PDF documents: it converts PDF to series of
images and passes it to a vision model requesting the contents in
markdown.
It is especially suitable for complex PDFs that are not parsed well by
other alternatives.
## Example use:
```python
from langchain_community.document_loaders.pdf import ZeroxPDFLoader
os.environ["OPENAI_API_KEY"] = "" ## your-api-key
model = "gpt-4o-mini" ## openai model
pdf_url = "https://assets.ctfassets.net/f1df9zr7wr1a/soP1fjvG1Wu66HJhu3FBS/034d6ca48edb119ae77dec5ce01a8612/OpenAI_Sacra_Teardown.pdf"
loader = ZeroxPDFLoader(file_path=pdf_url, model=model)
docs = loader.load()
```
The Zerox library supports wide range of provides/models. See Zerox
documentation for details.
- **Dependencies:** `zerox`
- **Twitter handle:** @martintriska1
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erickfriis@gmail.com>
## What this PR does?
### Currently `O365BaseLoader` (and consequently both derived loaders)
are limited to `pdf`, `doc`, `docx` files.
- **Solution: here we introduce _handlers_ attribute that allows for
custom handlers to be passed in. This is done in _dict_ form:**
**Example:**
```python
from langchain_community.document_loaders.parsers.documentloader_adapter import DocumentLoaderAsParser
# PR for DocumentLoaderAsParser here: https://github.com/langchain-ai/langchain/pull/27749
from langchain_community.document_loaders.excel import UnstructuredExcelLoader
xlsx_parser = DocumentLoaderAsParser(UnstructuredExcelLoader, mode="paged")
# create dictionary mapping file types to handlers (parsers)
handlers = {
"doc": MsWordParser()
"pdf": PDFMinerParser()
"txt": TextParser()
"xlsx": xlsx_parser
}
loader = SharePointLoader(document_library_id="...",
handlers=handlers # pass handlers to SharePointLoader
)
documents = loader.load()
# works the same in OneDriveLoader
loader = OneDriveLoader(document_library_id="...",
handlers=handlers
)
```
This dictionary is then passed to `MimeTypeBasedParser` same as in the
[current
implementation](5a2cfb49e0/libs/community/langchain_community/document_loaders/parsers/registry.py (L13)).
### Currently `SharePointLoader` and `OneDriveLoader` are separate
loaders that both inherit from `O365BaseLoader`
However both of these implement the same functionality. The only
differences are:
- `SharePointLoader` requires argument `document_library_id` whereas
`OneDriveLoader` requires `drive_id`. These are just different names for
the same thing.
- `SharePointLoader` implements significantly more features.
- **Solution: `OneDriveLoader` is replaced with an empty shell just
renaming `drive_id` to `document_library_id` and inheriting from
`SharePointLoader`**
**Dependencies:** None
**Twitter handle:** @martintriska1
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
The metadata["source"] value for the web paths was being set to
temporary path (/tmp).
Fixed it by creating a new variable self.original_file_path, which will
store the original path.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core, etc. is
being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description**
This PR introduces the proxies parameter to the RecursiveUrlLoader
class, allowing the user to specify proxy servers for requests. This
update enables crawling through proxy servers, providing enhanced
flexibility for network configurations.
The key changes include:
1.Added an optional proxies parameter to the constructor (__init__).
2.Updated the documentation to explain the proxies parameter usage with
an example.
3.Modified the _get_child_links_recursive method to pass the proxies
parameter to the requests.get function.
**Sample Usage**
```python
from bs4 import BeautifulSoup as Soup
from langchain_community.document_loaders.recursive_url_loader import RecursiveUrlLoader
proxies = {
"http": "http://localhost:1080",
"https": "http://localhost:1080",
}
url = "https://python.langchain.com/docs/concepts/#langchain-expression-language-lcel"
loader = RecursiveUrlLoader(
url=url, max_depth=1, extractor=lambda x: Soup(x, "html.parser").text,proxies=proxies
)
docs = loader.load()
```
---------
Co-authored-by: root <root@thb>
**Description**:
This PR add support of clob/blob data type for oracle document loader,
clob/blob can only be read by oracledb package when connection is open,
so reformat code to process data before connection closes.
**Dependencies**:
oracledb package same as before. pip install oracledb
Co-authored-by: Erick Friis <erick@langchain.dev>
This PR updates the Firecrawl Document Loader to use the recently
released V1 API of Firecrawl.
**Key Updates:**
**Firecrawl V1 Integration:** Updated the document loader to leverage
the new Firecrawl V1 API for improved performance, reliability, and
developer experience.
**Map Functionality Added:** Introduced the map mode for more flexible
document loading options.
These updates enhance the integration and provide access to the latest
features of Firecrawl.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:** This pull request addresses the validation error in
`SettingsConfigDict` due to extra fields in the `.env` file. The issue
is prevalent across multiple Langchain modules. This fix ensures that
extra fields in the `.env` file are ignored, preventing validation
errors.
**Changes include:**
- Applied fixes to modules using `SettingsConfigDict`.
- **Issue:** NA, similar
https://github.com/langchain-ai/langchain/issues/26850
- **Dependencies:** NA
- **Description:** The flag is named `anonymize_snippets`. When set to
true, the Pebblo server will anonymize snippets by redacting all
personally identifiable information (PII) from the snippets going into
VectorDB and the generated reports
- **Issue:** NA
- **Dependencies:** NA
- **docs**: Updated
- **Description:** Added PebbloTextLoader for loading text in
PebbloSafeLoader.
- Since PebbloSafeLoader wraps document loaders, this new loader enables
direct loading of text into Documents using PebbloSafeLoader.
- **Issue:** NA
- **Dependencies:** NA
- [x] **Tests**: Added/Updated tests
Page content sometimes is empty when PyMuPDF can not find text on pages.
For example, this can happen when the text of the PDF is not copyable
"by hand". Then an OCR solution is need - which is not integrated here.
This warning should accurately warn the user that some pages are lost
during this process.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
Fixes#26212: replaced the raw string with backslashes. Alternative:
raw-stringif the full docstring.
---------
Co-authored-by: Erick Friis <erickfriis@gmail.com>
### Description:
This pull request significantly enhances the MongodbLoader class in the
LangChain community package by adding robust metadata customization and
improved field extraction capabilities. The updated class now allows
users to specify additional metadata fields through the metadata_names
parameter, enabling the extraction of both top-level and deeply nested
document attributes as metadata. This flexibility is crucial for users
who need to include detailed contextual information without altering the
database schema.
Moreover, the include_db_collection_in_metadata flag offers optional
inclusion of database and collection names in the metadata, allowing for
even greater customization depending on the user's needs.
The loader's field extraction logic has been refined to handle missing
or nested fields more gracefully. It now employs a safe access mechanism
that avoids the KeyError previously encountered when a specified nested
field was absent in a document. This update ensures that the loader can
handle diverse and complex data structures without failure, making it
more resilient and user-friendly.
### Issue:
This pull request addresses a critical issue where the MongodbLoader
class in the LangChain community package could throw a KeyError when
attempting to access nested fields that may not exist in some documents.
The previous implementation did not handle the absence of specified
nested fields gracefully, leading to runtime errors and interruptions in
data processing workflows.
This enhancement ensures robust error handling by safely accessing
nested document fields, using default values for missing data, thus
preventing KeyError and ensuring smoother operation across various data
structures in MongoDB. This improvement is crucial for users working
with diverse and complex data sets, ensuring the loader can adapt to
documents with varying structures without failing.
### Dependencies:
Requires motor for asynchronous MongoDB interaction.
### Twitter handle:
N/A
### Add tests and docs
Tests: Unit tests have been added to verify that the metadata inclusion
toggle works as expected and that the field extraction correctly handles
nested fields.
Docs: An example notebook demonstrating the use of the enhanced
MongodbLoader is included in the docs/docs/integrations directory. This
notebook includes setup instructions, example usage, and outputs.
(Here is the notebook link : [colab
link](https://colab.research.google.com/drive/1tp7nyUnzZa3dxEFF4Kc3KS7ACuNF6jzH?usp=sharing))
Lint and test
Before submitting, I ran make format, make lint, and make test as per
the contribution guidelines. All tests pass, and the code style adheres
to the LangChain standards.
```python
import unittest
from unittest.mock import patch, MagicMock
import asyncio
from langchain_community.document_loaders.mongodb import MongodbLoader
class TestMongodbLoader(unittest.TestCase):
def setUp(self):
"""Setup the MongodbLoader test environment by mocking the motor client
and database collection interactions."""
# Mocking the AsyncIOMotorClient
self.mock_client = MagicMock()
self.mock_db = MagicMock()
self.mock_collection = MagicMock()
self.mock_client.get_database.return_value = self.mock_db
self.mock_db.get_collection.return_value = self.mock_collection
# Initialize the MongodbLoader with test data
self.loader = MongodbLoader(
connection_string="mongodb://localhost:27017",
db_name="testdb",
collection_name="testcol"
)
@patch('langchain_community.document_loaders.mongodb.AsyncIOMotorClient', return_value=MagicMock())
def test_constructor(self, mock_motor_client):
"""Test if the constructor properly initializes with the correct database and collection names."""
loader = MongodbLoader(
connection_string="mongodb://localhost:27017",
db_name="testdb",
collection_name="testcol"
)
self.assertEqual(loader.db_name, "testdb")
self.assertEqual(loader.collection_name, "testcol")
def test_aload(self):
"""Test the aload method to ensure it correctly queries and processes documents."""
# Setup mock data and responses for the database operations
self.mock_collection.count_documents.return_value = asyncio.Future()
self.mock_collection.count_documents.return_value.set_result(1)
self.mock_collection.find.return_value = [
{"_id": "1", "content": "Test document content"}
]
# Run the aload method and check responses
loop = asyncio.get_event_loop()
results = loop.run_until_complete(self.loader.aload())
self.assertEqual(len(results), 1)
self.assertEqual(results[0].page_content, "Test document content")
def test_construct_projection(self):
"""Verify that the projection dictionary is constructed correctly based on field names."""
self.loader.field_names = ['content', 'author']
self.loader.metadata_names = ['timestamp']
expected_projection = {'content': 1, 'author': 1, 'timestamp': 1}
projection = self.loader._construct_projection()
self.assertEqual(projection, expected_projection)
if __name__ == '__main__':
unittest.main()
```
### Additional Example for Documentation
Sample Data:
```json
[
{
"_id": "1",
"title": "Artificial Intelligence in Medicine",
"content": "AI is transforming the medical industry by providing personalized medicine solutions.",
"author": {
"name": "John Doe",
"email": "john.doe@example.com"
},
"tags": ["AI", "Healthcare", "Innovation"]
},
{
"_id": "2",
"title": "Data Science in Sports",
"content": "Data science provides insights into player performance and strategic planning in sports.",
"author": {
"name": "Jane Smith",
"email": "jane.smith@example.com"
},
"tags": ["Data Science", "Sports", "Analytics"]
}
]
```
Example Code:
```python
loader = MongodbLoader(
connection_string="mongodb://localhost:27017",
db_name="example_db",
collection_name="articles",
filter_criteria={"tags": "AI"},
field_names=["title", "content"],
metadata_names=["author.name", "author.email"],
include_db_collection_in_metadata=True
)
documents = loader.load()
for doc in documents:
print("Page Content:", doc.page_content)
print("Metadata:", doc.metadata)
```
Expected Output:
```
Page Content: Artificial Intelligence in Medicine AI is transforming the medical industry by providing personalized medicine solutions.
Metadata: {'author_name': 'John Doe', 'author_email': 'john.doe@example.com', 'database': 'example_db', 'collection': 'articles'}
```
Thank you.
---
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
---------
Co-authored-by: ccurme <chester.curme@gmail.com>
**Description:**
Adding a new option to the CSVLoader that allows us to implicitly
specify the columns that are used for generating the Document content.
Currently these are implicitly set as "all fields not part of the
metadata_columns".
In some cases however it is useful to have a field both as a metadata
and as part of the document content.
- **Description:** Added `ref` query parameter so data is not loaded
only from the default branch but any branch passed
---------
Co-authored-by: Osama Mehdi <mehdi@hm.edu>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Added langchain version while calling discover API
during both ingestion and retrieval
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** NA
- **Docs** NA
---------
Co-authored-by: dristy.cd <dristy@clouddefense.io>
- **Description:** Updating source path and file path in Pebblo safe
loader for SharePoint apps during loading
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** NA
- **Docs** NA
---------
Co-authored-by: dristy.cd <dristy@clouddefense.io>
This PR adds tiny improvements to the `GithubFileLoader` document loader
and its code sample, addressing the following issues:
1. Currently, the `file_extension` argument of `GithubFileLoader` does
not change its behavior at all.
1. The `GithubFileLoader` sample code in
`docs/docs/integrations/document_loaders/github.ipynb` does not work as
it stands.
The respective solutions I propose are the following:
1. Remove `file_extension` argument from `GithubFileLoader`.
1. Specify the branch as `master` (not the default `main`) and rename
`documents` as `document`.
---------
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
**Refactor PebbloSafeLoader**
- Created `APIWrapper` and moved API logic into it.
- Moved helper functions to the utility file.
- Created smaller functions and methods for better readability.
- Properly read environment variables.
- Removed unused code.
**Issue:** NA
**Dependencies:** NA
**tests**: Updated
- **Description:** Updating metadata for sharepoint loader with full
path i.e., webUrl
- **Issue:** NA
- **Dependencies:** NA
- **Tests:** NA
- **Docs** NA
Co-authored-by: dristy.cd <dristy@clouddefense.io>
Co-authored-by: ccurme <chester.curme@gmail.com>
- **Description:** The following
[line](fd546196ef/libs/community/langchain_community/document_loaders/parsers/audio.py (L117))
in `OpenAIWhisperParser` returns a text object for some odd reason
despite the official documentation saying it should return `Transcript`
Instance which should have the text attribute. But for the example given
in the issue and even when I tried running on my own, I was directly
getting the text. The small PR accounts for that.
- **Issue:** : #25218
I was able to replicate the error even without the GenericLoader as
shown below and the issue was with `OpenAIWhisperParser`
```python
parser = OpenAIWhisperParser(api_key="sk-fxxxxxxxxx",
response_format="srt",
temperature=0)
list(parser.lazy_parse(Blob.from_path('path_to_file.m4a')))
```
**Description:** This minor PR aims to add `llm_extraction` to Firecrawl
loader. This feature is supported on API and PythonSDK, but the
langchain loader omits adding this to the response.
**Twitter handle:** [scalable_pizza](https://x.com/scalablepizza)
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Upgrade to using a literal for specifying the extra which is the
recommended approach in pydantic 2.
This works correctly also in pydantic v1.
```python
from pydantic.v1 import BaseModel
class Foo(BaseModel, extra="forbid"):
x: int
Foo(x=5, y=1)
```
And
```python
from pydantic.v1 import BaseModel
class Foo(BaseModel):
x: int
class Config:
extra = "forbid"
Foo(x=5, y=1)
```
## Enum -> literal using grit pattern:
```
engine marzano(0.1)
language python
or {
`extra=Extra.allow` => `extra="allow"`,
`extra=Extra.forbid` => `extra="forbid"`,
`extra=Extra.ignore` => `extra="ignore"`
}
```
Resorted attributes in config and removed doc-string in case we will
need to deal with going back and forth between pydantic v1 and v2 during
the 0.3 release. (This will reduce merge conflicts.)
## Sort attributes in Config:
```
engine marzano(0.1)
language python
function sort($values) js {
return $values.text.split(',').sort().join("\n");
}
class_definition($name, $body) as $C where {
$name <: `Config`,
$body <: block($statements),
$values = [],
$statements <: some bubble($values) assignment() as $A where {
$values += $A
},
$body => sort($values),
}
```
- community: Allow authorization to Confluence with bearer token
- **Description:** Allow authorization to Confluence with [Personal
Access
Token](https://confluence.atlassian.com/enterprise/using-personal-access-tokens-1026032365.html)
by checking for the keys `['client_id', token: ['access_token',
'token_type']]`
- **Issue:**
Currently the following error occurs when using an personal access token
for authorization.
```python
loader = ConfluenceLoader(
url=os.getenv('CONFLUENCE_URL'),
oauth2={
'token': {"access_token": os.getenv("CONFLUENCE_ACCESS_TOKEN"), "token_type": "bearer"},
'client_id': 'client_id',
},
page_ids=['12345678'],
)
```
```
ValueError: Error(s) while validating input: ["You have either omitted require keys or added extra keys to the oauth2 dictionary. key values should be `['access_token', 'access_token_secret', 'consumer_key', 'key_cert']`"]
```
With this PR the loader runs as expected.
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
- **Description:** This PR makes the AthenaLoader profile_name optional
and fixes the type hint which says the type is `str` but it should be
`str` or `None` as None is handled in the loader init. This is a minor
problem but it just confused me when I was using the Athena Loader to
why we had to use a Profile, as I want that for local but not
production.
- **Issue:** #24957
- **Dependencies:** None.
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
**Description:** This PR fixes a KeyError in NotionDBLoader when the
"name" key is missing in the "people" property.
**Issue:** Fixes#24223
**Dependencies:** None
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
This PR adds annotations in comunity package.
Annotations are only strictly needed in subclasses of BaseModel for
pydantic 2 compatibility.
This PR adds some unnecessary annotations, but they're not bad to have
regardless for documentation pages.
- **Title:** [PebbloSafeLoader] Implement content-size-based batching in
the classification flow(loader/doc API)
- **Description:**
- Implemented content-size-based batching in the loader/doc API, set to
100KB with no external configuration option, intentionally hard-coded to
prevent timeouts.
- Remove unused field(pb_id) from doc_metadata
- **Issue:** NA
- **Dependencies:** NA
- **Add tests and docs:** Updated
community:Add support for specifying document_loaders.firecrawl api url.
Add support for specifying document_loaders.firecrawl api url.
This is mainly to support the
[self-hosting](https://github.com/mendableai/firecrawl/blob/main/SELF_HOST.md)
option firecrawl provides. Eg. now I can specify localhost:....
The corresponding firecrawl class already provides functionality to pass
the argument. See here:
4c9d62f6d3/apps/python-sdk/firecrawl/firecrawl.py (L29)
---------
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Added [ScrapingAnt](https://scrapingant.com/) Web Loader integration.
ScrapingAnt is a web scraping API that allows extracting web page data
into accessible and well-formatted markdown.
Description: Added ScrapingAnt web loader for retrieving web page data
as markdown
Dependencies: scrapingant-client
Twitter: @WeRunTheWorld3
---------
Co-authored-by: Oleg Kulyk <oleg@scrapingant.com>