Commit Graph

1693 Commits

Author SHA1 Message Date
mhavey
1bbb64d956 community[minor], langchian[minor]: Add Neptune Rdf graph and chain (#16650)
**Description**: This PR adds a chain for Amazon Neptune graph database
RDF format. It complements the existing Neptune Cypher chain. The PR
also includes a Neptune RDF graph class to connect to, introspect, and
query a Neptune RDF graph database from the chain. A sample notebook is
provided under docs that demonstrates the overall effect: invoking the
chain to make natural language queries against Neptune using an LLM.

**Issue**: This is a new feature
 
**Dependencies**: The RDF graph class depends on the AWS boto3 library
if using IAM authentication to connect to the Neptune database.

---------

Co-authored-by: Piyush Jain <piyushjain@duck.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-12 21:30:20 -08:00
Michael Feil
e1cfd0f3e7 community[patch]: infinity embeddings update incorrect default url (#16759)
The default url has always been incorrect (7797 instead 7997). Here is a
update to the correct url.
2024-02-12 20:05:08 -08:00
Andreas Motl
1fdd9bd980 community/SQLDatabase: Generalize and trim software tests (#16659)
- **Description:** Improve test cases for `SQLDatabase` adapter
component, see
[suggestion](https://github.com/langchain-ai/langchain/pull/16655#pullrequestreview-1846749474).
  - **Depends on:** GH-16655
  - **Addressed to:** @baskaryan, @cbornet, @eyurtsev

_Remark: This PR is stacked upon GH-16655, so that one will need to go
in first._

Edit: Thank you for bringing in GH-17191, @eyurtsev. This is a little
aftermath, improving/streamlining the corresponding test cases.
2024-02-12 22:58:34 -05:00
Theo / Taeyoon Kang
1987f905ed core[patch]: Support .yml extension for YAML (#16783)
- **Description:**

[AS-IS] When dealing with a yaml file, the extension must be .yaml.  

[TO-BE] In the absence of extension length constraints in the OS, the
extension of the YAML file is yaml, but control over the yml extension
must still be made.

It's as if it's an error because it's a .jpg extension in jpeg support.

  - **Issue:** - 

  - **Dependencies:**
no dependencies required for this change,
2024-02-12 19:57:20 -08:00
Kapil Sachdeva
cd00a87db7 community[patch] - in FAISS vector store, support passing custom DocStore implementation when using from_xxx methods (#16801)
- **Description:** The from__xx methods of FAISS class have hardcoded
InMemoryStore implementation and thereby not let users pass a custom
DocStore implementation,
  - **Issue:** no referenced issue,
  - **Dependencies:** none,
  - **Twitter handle:** ksachdeva
2024-02-12 19:51:55 -08:00
Chris
f9f5626ca4 community[patch]: Fix github search issues and PRs PaginatedList has no len() error (#16806)
**Description:** 
Bugfix: Langchain_community's GitHub Api wrapper throws a TypeError when
searching for issues and/or PRs (the `search_issues_and_prs` method).
This is because PyGithub's PageinatedList type does not support the
len() method. See https://github.com/PyGithub/PyGithub/issues/1476

![image](https://github.com/langchain-ai/langchain/assets/8849021/57390b11-ed41-4f48-ba50-f3028610789c)
  **Dependencies:** None 
  **Twitter handle**: @ChrisKeoghNZ
  
I haven't registered an issue as it would take me longer to fill the
template out than to make the fix, but I'm happy to if that's deemed
essential.

I've added a simple integration test to cover this as there were no
existing unit tests and it was going to be tricky to set them up.

Co-authored-by: Chris Keogh <chris.keogh@xero.com>
2024-02-12 19:50:59 -08:00
morgana
722aae4fd1 community: add delete method to rocksetdb vectorstore to support recordmanager (#17030)
- **Description:** This adds a delete method so that rocksetdb can be
used with `RecordManager`.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** `@_morgan_adams_`

---------

Co-authored-by: Rockset API Bot <admin@rockset.io>
2024-02-12 19:50:20 -08:00
yin1991
c454dc36fc community[proxy]: Enhancement/add proxy support playwrighturlloader 16751 (#16822)
- **Description:** Enhancement/add proxy support playwrighturlloader
16751
- **Issue:** [Enhancement: Add Proxy Support to PlaywrightURLLoader
Class](https://github.com/langchain-ai/langchain/issues/16751)
  - **Dependencies:** 
  - **Twitter handle:** @ootR77013489

---------

Co-authored-by: root <root@ip-172-31-46-160.ap-southeast-1.compute.internal>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-12 19:48:29 -08:00
Lingzhen Chen
30af711c34 community[patch]: update AzureSearch class to work with azure-search-documents=11.4.0 (#15659)
- **Description:** Updates
`libs/community/langchain_community/vectorstores/azuresearch.py` to
support the stable version `azure-search-documents=11.4.0`
- **Issue:** https://github.com/langchain-ai/langchain/issues/14534,
https://github.com/langchain-ai/langchain/issues/15039,
https://github.com/langchain-ai/langchain/issues/15355
  - **Dependencies:** azure-search-documents>=11.4.0

---------

Co-authored-by: Clément Tamines <Skar0@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-12 19:23:35 -08:00
Robby
e135dc70c3 community[patch]: Invoke callback prior to yielding token (#17348)
**Description:** Invoke callback prior to yielding token in stream
method for Ollama.
**Issue:** [Callback for on_llm_new_token should be invoked before the
token is yielded by the model
#16913](https://github.com/langchain-ai/langchain/issues/16913)

Co-authored-by: Robby <h0rv@users.noreply.github.com>
2024-02-12 19:22:55 -08:00
Christophe Bornet
ab025507bc community[patch]: Add async methods to VectorStoreQATool (#16949) 2024-02-12 19:19:50 -08:00
Christophe Bornet
fb7552bfcf Add async methods to InMemoryCache (#17425)
Add async methods to InMemoryCache
2024-02-12 22:02:38 -05:00
yin1991
37ef6ac113 community[patch]: Add Pagination to GitHubIssuesLoader for Efficient GitHub Issues Retrieval (#16934)
- **Description:** Add Pagination to GitHubIssuesLoader for Efficient
GitHub Issues Retrieval
- **Issue:** [the issue # it fixes if
applicable,](https://github.com/langchain-ai/langchain/issues/16864)

---------

Co-authored-by: root <root@ip-172-31-46-160.ap-southeast-1.compute.internal>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-12 18:30:36 -08:00
Robby
ece4b43a81 community[patch]: doc loaders mypy fixes (#17368)
**Description:** Fixed `type: ignore`'s for mypy for some
document_loaders.
**Issue:** [Remove "type: ignore" comments #17048
](https://github.com/langchain-ai/langchain/issues/17048)

---------

Co-authored-by: Robby <h0rv@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-02-12 16:51:06 -08:00
Robby
0653aa469a community[patch]: Invoke callback prior to yielding token (#17346)
**Description:** Invoke callback prior to yielding token in stream
method for watsonx.
**Issue:** [Callback for on_llm_new_token should be invoked before the
token is yielded by the model
#16913](https://github.com/langchain-ai/langchain/issues/16913)

Co-authored-by: Robby <h0rv@users.noreply.github.com>
2024-02-12 16:36:33 -08:00
Bagatur
f7e453971d community[patch]: remove print (#17435) 2024-02-12 15:21:38 -08:00
Spencer Kelly
54fa78c887 community[patch]: fixed vector similarity filtering (#16967)
**Description:** changed filtering so that failed filter doesn't add
document to results. Currently filtering is entirely broken and all
documents are returned whether or not they pass the filter.

fixes issue introduced in
https://github.com/langchain-ai/langchain/pull/16190
2024-02-12 14:52:57 -08:00
Abhijeeth Padarthi
584b647b96 community[minor]: AWS Athena Document Loader (#15625)
- **Description:** Adds the document loader for [AWS
Athena](https://aws.amazon.com/athena/), a serverless and interactive
analytics service.
  - **Dependencies:** Added boto3 as a dependency
2024-02-12 12:53:40 -08:00
david-tempelmann
93da18b667 community[minor]: Add mmr and similarity_score_threshold retrieval to DatabricksVectorSearch (#16829)
- **Description:** This PR adds support for `search_types="mmr"` and
`search_type="similarity_score_threshold"` to retrievers using
`DatabricksVectorSearch`,
  - **Issue:** 
  - **Dependencies:**
  - **Twitter handle:**

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-12 12:51:37 -08:00
Massimiliano Pronesti
3894b4d9a5 community: add gpt-4-turbo and gpt-4-0125 costs (#17349)
Ref: https://openai.com/pricing
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2024-02-11 21:24:24 -08:00
Erick Friis
3a2eb6e12b infra: add print rule to ruff (#16221)
Added noqa for existing prints. Can slowly remove / will prevent more
being intro'd
2024-02-09 16:13:30 -08:00
Jael Gu
c07c0da01a community[patch]: Fix Milvus add texts when ids=None (#17021)
- **Description:** Fix Milvus add texts when ids=None (auto_id=True)

Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-09 18:48:37 -05:00
Quang Hoa
54c1fb3f25 community[patch]: Make some functions work with Milvus (#10695)
**Description**
Make some functions work with Milvus:
1. get_ids: Get primary keys by field in the metadata
2. delete: Delete one or more entities by ids
3. upsert: Update/Insert one or more entities

**Issue**
None
**Dependencies**
None
**Tag maintainer:**
@hwchase17 
**Twitter handle:**
None

---------

Co-authored-by: HoaNQ9 <hoanq.1811@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-09 15:21:31 -08:00
kYLe
c9999557bf community[patch]: Modify LLMs/Anyscale work with OpenAI API v1 (#14206)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
- **Description:** 
1. Modify LLMs/Anyscale to work with OAI v1
2. Get rid of openai_ prefixed variables in Chat_model/ChatAnyscale
3. Modify `anyscale_api_base` to `anyscale_base_url` to follow OAI name
convention (reverted)

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-09 15:11:18 -08:00
Charlie Marsh
24c0bab57b infra, multiple: Upgrade configuration for Ruff v0.2.0 (#16905)
## Summary

This PR upgrades LangChain's Ruff configuration in preparation for
Ruff's v0.2.0 release. (The changes are compatible with Ruff v0.1.5,
which LangChain uses today.) Specifically, we're now warning when
linter-only options are specified under `[tool.ruff]` instead of
`[tool.ruff.lint]`.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-09 14:28:02 -08:00
Leonid Ganeline
932c52c333 community[patch]: docstrings (#16810)
- added missed docstrings
- formated docstrings to the consistent form
2024-02-09 12:48:57 -08:00
Kononov Pavel
15bc201967 langchain_community: Fix typo bug (#17324)
Problem from #17095

This error wasn't in the v1.4.0
2024-02-09 11:27:33 -05:00
Bagatur
65e97c9b53 infra: mv SQLDatabase tests to community (#17276) 2024-02-08 17:05:43 -08:00
Bagatur
02ef9164b5 langchain[patch]: expose cohere rerank score, add parent doc param (#16887) 2024-02-08 16:07:18 -08:00
Bagatur
35c1bf339d infra: rm boto3, gcaip from pyproject (#17270) 2024-02-08 15:28:22 -08:00
Alex
de5e96b5f9 community[patch]: updated openai prices in mapping (#17009)
- **Description:** there are january prices update for chatgpt
[blog](https://openai.com/blog/new-embedding-models-and-api-updates),
also there are updates on their website on page
[pricing](https://openai.com/pricing)
- **Issue:** N/A

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-08 14:43:44 -08:00
Armin Stepanyan
641efcf41c community: add runtime kwargs to HuggingFacePipeline (#17005)
This PR enables changing the behaviour of huggingface pipeline between
different calls. For example, before this PR there's no way of changing
maximum generation length between different invocations of the chain.
This is desirable in cases, such as when we want to scale the maximum
output size depending on a dynamic prompt size.

Usage example:

```python
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_id = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
hf = HuggingFacePipeline(pipeline=pipe)

hf("Say foo:", pipeline_kwargs={"max_new_tokens": 42})
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-08 13:58:31 -08:00
Scott Nath
a32798abd7 community: Add you.com utility, update you retriever integration docs (#17014)
<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

- **Description: changes to you.com files** 
    - general cleanup
- adds community/utilities/you.py, moving bulk of code from retriever ->
utility
    - removes `snippet` as endpoint
    - adds `news` as endpoint
    - adds more tests

<s>**Description: update community MAKE file** 
    - adds `integration_tests`
    - adds `coverage`</s>

- **Issue:** the issue # it fixes if applicable,
- [For New Contributors: Update Integration
Documentation](https://github.com/langchain-ai/langchain/issues/15664#issuecomment-1920099868)
- **Dependencies:** n/a
- **Twitter handle:** @scottnath
- **Mastodon handle:** scottnath@mastodon.social

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-08 13:47:50 -08:00
Liang Zhang
7306600e2f community[patch]: Support SerDe transform functions in Databricks LLM (#16752)
**Description:** Databricks LLM does not support SerDe the
transform_input_fn and transform_output_fn. After saving and loading,
the LLM will be broken. This PR serialize these functions into a hex
string using pickle, and saving the hex string in the yaml file. Using
pickle to serialize a function can be flaky, but this is a simple
workaround that unblocks many use cases. If more sophisticated SerDe is
needed, we can improve it later.

Test:
Added a simple unit test.
I did manual test on Databricks and it works well.
The saved yaml looks like:
```
llm:
      _type: databricks
      cluster_driver_port: null
      cluster_id: null
      databricks_uri: databricks
      endpoint_name: databricks-mixtral-8x7b-instruct
      extra_params: {}
      host: e2-dogfood.staging.cloud.databricks.com
      max_tokens: null
      model_kwargs: null
      n: 1
      stop: null
      task: null
      temperature: 0.0
      transform_input_fn: 80049520000000000000008c085f5f6d61696e5f5f948c0f7472616e73666f726d5f696e7075749493942e
      transform_output_fn: null
```

@baskaryan

```python
from langchain_community.embeddings import DatabricksEmbeddings
from langchain_community.llms import Databricks
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
import mlflow

embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en")

def transform_input(**request):
  request["messages"] = [
    {
      "role": "user",
      "content": request["prompt"]
    }
  ]
  del request["prompt"]
  return request

llm = Databricks(endpoint_name="databricks-mixtral-8x7b-instruct", transform_input_fn=transform_input)

persist_dir = "faiss_databricks_embedding"

# Create the vector db, persist the db to a local fs folder
loader = TextLoader("state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
db = FAISS.from_documents(docs, embeddings)
db.save_local(persist_dir)

def load_retriever(persist_directory):
    embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en")
    vectorstore = FAISS.load_local(persist_directory, embeddings)
    return vectorstore.as_retriever()

retriever = load_retriever(persist_dir)
retrievalQA = RetrievalQA.from_llm(llm=llm, retriever=retriever)
with mlflow.start_run() as run:
    logged_model = mlflow.langchain.log_model(
        retrievalQA,
        artifact_path="retrieval_qa",
        loader_fn=load_retriever,
        persist_dir=persist_dir,
    )

# Load the retrievalQA chain
loaded_model = mlflow.pyfunc.load_model(logged_model.model_uri)
print(loaded_model.predict([{"query": "What did the president say about Ketanji Brown Jackson"}]))

```
2024-02-08 13:09:50 -08:00
cjpark-data
ce22e10c4b community[patch]: Fix KeyError 'embedding' (MongoDBAtlasVectorSearch) (#17178)
- **Description:**
Embedding field name was hard-coded named "embedding".
So I suggest that change `res["embedding"]` into
`res[self._embedding_key]`.
  - **Issue:** #17177,
- **Twitter handle:**
[@bagcheoljun17](https://twitter.com/bagcheoljun17)
2024-02-08 12:06:42 -08:00
Neli Hateva
9bb5157a3d langchain[patch], community[patch]: Fixes in the Ontotext GraphDB Graph and QA Chain (#17239)
- **Description:** Fixes in the Ontotext GraphDB Graph and QA Chain
related to the error handling in case of invalid SPARQL queries, for
which `prepareQuery` doesn't throw an exception, but the server returns
400 and the query is indeed invalid
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Twitter handle:** @OntotextGraphDB
2024-02-08 12:05:43 -08:00
ByeongUk Choi
b88329e9a5 community[patch]: Implement Unique ID Enforcement in FAISS (#17244)
**Description:**
Implemented unique ID validation in the FAISS component to ensure all
document IDs are distinct. This update resolves issues related to
non-unique IDs, such as inconsistent behavior during deletion processes.
2024-02-08 12:03:33 -08:00
Bassem Yacoube
4e3ed7f043 community[patch]: octoai embeddings bug fix (#17216)
fixes a bug in octoa_embeddings provider
2024-02-07 22:25:52 -05:00
Eugene Yurtsev
780e84ae79 community[minor]: SQLDatabase Add fetch mode cursor, query parameters, query by selectable, expose execution options, and documentation (#17191)
- **Description:** Improve `SQLDatabase` adapter component to promote
code re-use, see
[suggestion](https://github.com/langchain-ai/langchain/pull/16246#pullrequestreview-1846590962).
  - **Needed by:** GH-16246
  - **Addressed to:** @baskaryan, @cbornet 

## Details
- Add `cursor` fetch mode
- Accept SQL query parameters
- Accept both `str` and SQLAlchemy selectables as query expression
- Expose `execution_options`
- Documentation page (notebook) about `SQLDatabase` [^1]
See [About
SQLDatabase](https://github.com/langchain-ai/langchain/blob/c1c7b763/docs/docs/integrations/tools/sql_database.ipynb).

[^1]: Apparently there hasn't been any yet?

---------

Co-authored-by: Andreas Motl <andreas.motl@crate.io>
2024-02-07 22:23:43 -05:00
Tomaz Bratanic
7e4b676d53 community[patch]: Better error propagation for neo4jgraph (#17190)
There are other errors that could happen when refreshing the schema, so
we want to propagate specific errors for more clarity
2024-02-07 22:16:14 -05:00
Luiz Ferreira
34d2daffb3 community[patch]: Fix chat openai unit test (#17124)
- **Description:** 
Actually the test named `test_openai_apredict` isn't testing the
apredict method from ChatOpenAI.
  - **Twitter handle:**
  https://twitter.com/OAlmofadas
2024-02-07 22:08:26 -05:00
Dmitry Kankalovich
f92738a6f6 langchain[minor], community[minor], core[minor]: Async Cache support and AsyncRedisCache (#15817)
* This PR adds async methods to the LLM cache. 
* Adds an implementation using Redis called AsyncRedisCache.
* Adds a docker compose file at the /docker to help spin up docker
* Updates redis tests to use a context manager so flushing always happens by default
2024-02-07 22:06:09 -05:00
Bagatur
6f1403b9b6 community[patch]: Release 0.0.19 (#17207)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-07 15:37:01 -08:00
Bagatur
af74301ab9 core[patch], community[patch]: link extraction continue on failure (#17200) 2024-02-07 14:15:30 -08:00
Erick Friis
22b6a03a28 infra: read min versions (#17135) 2024-02-06 16:05:11 -08:00
Bagatur
226f376d59 community[patch]: Release 0.0.18 (#17129)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-06 13:40:00 -08:00
Frank
ef082c77b1 community[minor]: add github file loader to load any github file content b… (#15305)
### Description
support load any github file content based on file extension.  

Why not use [git
loader](https://python.langchain.com/docs/integrations/document_loaders/git#load-existing-repository-from-disk)
?
git loader clones the whole repo even only interested part of files,
that's too heavy. This GithubFileLoader only downloads that you are
interested files.

### Twitter handle
my twitter: @shufanhaotop

---------

Co-authored-by: Hao Fan <h_fan@apple.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-06 09:42:33 -08:00
Ryan Kraus
f027696b5f community: Added new Utility runnables for NVIDIA Riva. (#15966)
**Please tag this issue with `nvidia_genai`**

- **Description:** Added new Runnables for integration NVIDIA Riva into
LCEL chains for Automatic Speech Recognition (ASR) and Text To Speech
(TTS).
- **Issue:** N/A
- **Dependencies:** To use these runnables, the NVIDIA Riva client
libraries are required. It they are not installed, an error will be
raised instructing how to install them. The Runnables can be safely
imported without the riva client libraries.
- **Twitter handle:** N/A

All of the Riva Runnables are inside a single folder in the Utilities
module. In this folder are four files:
- common.py - Contains all code that is common to both TTS and ASR
- stream.py - Contains a class representing an audio stream that allows
the end user to put data into the stream like a queue.
- asr.py - Contains the RivaASR runnable
- tts.py - Contains the RivaTTS runnable

The following Python function is an example of creating a chain that
makes use of both of these Runnables:

```python
def create(
    config: Configuration,
    audio_encoding: RivaAudioEncoding,
    sample_rate: int,
    audio_channels: int = 1,
) -> Runnable[ASRInputType, TTSOutputType]:
    """Create a new instance of the chain."""
    _LOGGER.info("Instantiating the chain.")

    # create the riva asr client
    riva_asr = RivaASR(
        url=str(config.riva_asr.service.url),
        ssl_cert=config.riva_asr.service.ssl_cert,
        encoding=audio_encoding,
        audio_channel_count=audio_channels,
        sample_rate_hertz=sample_rate,
        profanity_filter=config.riva_asr.profanity_filter,
        enable_automatic_punctuation=config.riva_asr.enable_automatic_punctuation,
        language_code=config.riva_asr.language_code,
    )

    # create the prompt template
    prompt = PromptTemplate.from_template("{user_input}")

    # model = ChatOpenAI()
    model = ChatNVIDIA(model="mixtral_8x7b")  # type: ignore

    # create the riva tts client
    riva_tts = RivaTTS(
        url=str(config.riva_asr.service.url),
        ssl_cert=config.riva_asr.service.ssl_cert,
        output_directory=config.riva_tts.output_directory,
        language_code=config.riva_tts.language_code,
        voice_name=config.riva_tts.voice_name,
    )

    # construct and return the chain
    return {"user_input": riva_asr} | prompt | model | riva_tts  # type: ignore
```

The following code is an example of creating a new audio stream for
Riva:

```python
input_stream = AudioStream(maxsize=1000)
# Send bytes into the stream
for chunk in audio_chunks:
    await input_stream.aput(chunk)
input_stream.close()
```

The following code is an example of how to execute the chain with
RivaASR and RivaTTS

```python
output_stream = asyncio.Queue()
while not input_stream.complete:
    async for chunk in chain.astream(input_stream):
        output_stream.put(chunk)    
```

Everything should be async safe and thread safe. Audio data can be put
into the input stream while the chain is running without interruptions.

---------

Co-authored-by: Hayden Wolff <hwolff@nvidia.com>
Co-authored-by: Hayden Wolff <hwolff@Haydens-Laptop.local>
Co-authored-by: Hayden Wolff <haydenwolff99@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-02-05 19:50:50 -08:00
François Paupier
929f071513 community[patch]: Fix error in LlamaCpp community LLM with Configurable Fields, 'grammar' custom type not available (#16995)
- **Description:** Ensure the `LlamaGrammar` custom type is always
available when instantiating a `LlamaCpp` LLM
  - **Issue:** #16994 
  - **Dependencies:** None
  - **Twitter handle:** @fpaupier

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-05 17:56:58 -08:00
Scott Nath
10bd901139 infra: add integration_tests and coverage to MAKEFILE (#17053)
- **Description: update community MAKE file** 
    - adds `integration_tests`
    - adds `coverage`

- **Issue:** the issue # it fixes if applicable,
    - moving out of https://github.com/langchain-ai/langchain/pull/17014
- **Dependencies:** n/a
- **Twitter handle:** @scottnath
- **Mastodon handle:** scottnath@mastodon.social

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-02-05 16:39:55 -08:00