{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Baichuan Text Embeddings\n", "\n", "As of today (Jan 25th, 2024) BaichuanTextEmbeddings ranks #1 in C-MTEB (Chinese Multi-Task Embedding Benchmark) leaderboard.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Leaderboard (Under Overall -> Chinese section): https://huggingface.co/spaces/mteb/leaderboard" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Official Website: https://platform.baichuan-ai.com/docs/text-Embedding\n", "\n", "An API key is required to use this embedding model. You can get one by registering at https://platform.baichuan-ai.com/docs/text-Embedding." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "BaichuanTextEmbeddings support 512 token window and preduces vectors with 1024 dimensions. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Please NOTE that BaichuanTextEmbeddings only supports Chinese text embedding. Multi-language support is coming soon." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from langchain_community.embeddings import BaichuanTextEmbeddings\n", "\n", "embeddings = BaichuanTextEmbeddings(baichuan_api_key=\"sk-*\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, you can set API key this way:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "os.environ[\"BAICHUAN_API_KEY\"] = \"YOUR_API_KEY\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "text_1 = \"今天天气不错\"\n", "text_2 = \"今天阳光很好\"\n", "\n", "query_result = embeddings.embed_query(text_1)\n", "query_result" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "doc_result = embeddings.embed_documents([text_1, text_2])\n", "doc_result" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }