mirror of
https://github.com/hwchase17/langchain.git
synced 2026-01-29 21:30:18 +00:00
…integrations/document_loaders/ All document loaders section
Thank you for contributing to LangChain!
- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, core, etc. is being
modified. Use "docs: ..." for purely docs changes, "infra: ..." for CI
changes.
- Example: "core: add foobar LLM"
- [x] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, eyurtsev, ccurme, vbarda, hwchase17.
200 lines
5.2 KiB
Plaintext
200 lines
5.2 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e8712110",
|
|
"metadata": {},
|
|
"source": [
|
|
"Model2Vec is a technique to turn any sentence transformer into a really small static model\n",
|
|
"[model2vec](https://github.com/MinishLab/model2vec) can be used to generate embeddings."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "266dd424",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setup\n",
|
|
"\n",
|
|
"```bash\n",
|
|
"pip install -U langchain-community\n",
|
|
"```\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "78ab91a6",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Instantiation"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d06e7719",
|
|
"metadata": {},
|
|
"source": [
|
|
"Ensure that `model2vec` is installed\n",
|
|
"\n",
|
|
"```bash\n",
|
|
"pip install -U model2vec\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f8ea1ed5",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Indexing and Retrieval"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "d25dc22d-b656-46c6-a42d-eace958590cd",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-05-24T15:13:17.176956Z",
|
|
"start_time": "2023-05-24T15:13:15.399076Z"
|
|
},
|
|
"execution": {
|
|
"iopub.execute_input": "2024-03-29T15:39:19.252281Z",
|
|
"iopub.status.busy": "2024-03-29T15:39:19.252101Z",
|
|
"iopub.status.idle": "2024-03-29T15:39:19.339106Z",
|
|
"shell.execute_reply": "2024-03-29T15:39:19.338614Z",
|
|
"shell.execute_reply.started": "2024-03-29T15:39:19.252260Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain_community.embeddings import Model2vecEmbeddings"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "8397b91f-a1f9-4be6-a699-fedaada7c37a",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-05-24T15:13:17.193751Z",
|
|
"start_time": "2023-05-24T15:13:17.182053Z"
|
|
},
|
|
"execution": {
|
|
"iopub.execute_input": "2024-03-29T15:39:19.901573Z",
|
|
"iopub.status.busy": "2024-03-29T15:39:19.900935Z",
|
|
"iopub.status.idle": "2024-03-29T15:39:19.906540Z",
|
|
"shell.execute_reply": "2024-03-29T15:39:19.905345Z",
|
|
"shell.execute_reply.started": "2024-03-29T15:39:19.901529Z"
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"embeddings = Model2vecEmbeddings(\"minishlab/potion-base-8M\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "abcf98b7-424c-4691-a1cd-862c3d53be11",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-05-24T15:13:17.844903Z",
|
|
"start_time": "2023-05-24T15:13:17.198751Z"
|
|
},
|
|
"execution": {
|
|
"iopub.execute_input": "2024-03-29T15:39:20.434581Z",
|
|
"iopub.status.busy": "2024-03-29T15:39:20.433117Z",
|
|
"iopub.status.idle": "2024-03-29T15:39:22.178650Z",
|
|
"shell.execute_reply": "2024-03-29T15:39:22.176058Z",
|
|
"shell.execute_reply.started": "2024-03-29T15:39:20.434501Z"
|
|
},
|
|
"scrolled": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"query_text = \"This is a test query.\"\n",
|
|
"query_result = embeddings.embed_query(query_text)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"id": "98897454-b280-4ee1-bbb9-2c6c15342f87",
|
|
"metadata": {
|
|
"ExecuteTime": {
|
|
"end_time": "2023-05-24T15:13:18.605339Z",
|
|
"start_time": "2023-05-24T15:13:17.845906Z"
|
|
},
|
|
"execution": {
|
|
"iopub.execute_input": "2024-03-29T15:39:28.164009Z",
|
|
"iopub.status.busy": "2024-03-29T15:39:28.161759Z",
|
|
"iopub.status.idle": "2024-03-29T15:39:30.217232Z",
|
|
"shell.execute_reply": "2024-03-29T15:39:30.215348Z",
|
|
"shell.execute_reply.started": "2024-03-29T15:39:28.163876Z"
|
|
},
|
|
"scrolled": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"document_text = \"This is a test document.\"\n",
|
|
"document_result = embeddings.embed_documents([document_text])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "11bac134",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Direct Usage\n",
|
|
"\n",
|
|
"Here's how you would directly make use of `model2vec`\n",
|
|
"\n",
|
|
"```python\n",
|
|
"from model2vec import StaticModel\n",
|
|
"\n",
|
|
"# Load a model from the HuggingFace hub (in this case the potion-base-8M model)\n",
|
|
"model = StaticModel.from_pretrained(\"minishlab/potion-base-8M\")\n",
|
|
"\n",
|
|
"# Make embeddings\n",
|
|
"embeddings = model.encode([\"It's dangerous to go alone!\", \"It's a secret to everybody.\"])\n",
|
|
"\n",
|
|
"# Make sequences of token embeddings\n",
|
|
"token_embeddings = model.encode_as_sequence([\"It's dangerous to go alone!\", \"It's a secret to everybody.\"])\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d81e21aa",
|
|
"metadata": {},
|
|
"source": [
|
|
"## API Reference\n",
|
|
"\n",
|
|
"For more information check out the model2vec github [repo](https://github.com/MinishLab/model2vec)"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.11.3"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|