langchain

mirror of https://github.com/hwchase17/langchain.git synced 2025-06-05 06:33:20 +00:00

History

Cristóbal Carnero Liñán e494b0a09f feat (documents): add a source code loader based on AST manipulation (#6486 ) #### Summary A new approach to loading source code is implemented: Each top-level function and class in the code is loaded into separate documents. Then, an additional document is created with the top-level code, but without the already loaded functions and classes. This could improve the accuracy of QA chains over source code. For instance, having this script: ``` class MyClass: def __init__(self, name): self.name = name def greet(self): print(f"Hello, {self.name}!") def main(): name = input("Enter your name: ") obj = MyClass(name) obj.greet() if __name__ == '__main__': main() ``` The loader will create three documents with this content: First document: ``` class MyClass: def __init__(self, name): self.name = name def greet(self): print(f"Hello, {self.name}!") ``` Second document: ``` def main(): name = input("Enter your name: ") obj = MyClass(name) obj.greet() ``` Third document: ``` # Code for: class MyClass: # Code for: def main(): if __name__ == '__main__': main() ``` A threshold parameter is added to control whether small scripts are split in this way or not. At this moment, only Python and JavaScript are supported. The appropriate parser is determined by examining the file extension. #### Tests This PR adds: - Unit tests - Integration tests #### Dependencies Only one dependency was added as optional (needed for the JavaScript parser). #### Documentation A notebook is added showing how the loader can be used. #### Who can review? @eyurtsev @hwchase17 --------- Co-authored-by: rlm <pexpresss31@gmail.com>		2023-06-27 15:58:47 -07:00
..
agent	Add Multi-CSV/DF support in CSV and DataFrame Toolkits (#5009 )	2023-05-25 14:23:11 -07:00
cache	feat: add Momento as a standard cache and chat message history provider (#5221 )	2023-05-25 19:13:21 -07:00
callbacks	split up batch llm calls into separate runs (#5804 )	2023-06-24 21:03:31 -07:00
chains	Add support for passing headers and search params to openai openapi chain (#6782 )	2023-06-27 09:09:03 -07:00
chat_models	fix anthropic chat model mutating input list (#6457 )	2023-06-19 21:30:52 -07:00
document_loaders	feat (documents): add a source code loader based on AST manipulation (#6486 )	2023-06-27 15:58:47 -07:00
embeddings	feat: interfaces for async embeddings, implement async openai (#6563 )	2023-06-21 23:16:33 -07:00
examples	feat (documents): add a source code loader based on AST manipulation (#6486 )	2023-06-27 15:58:47 -07:00
llms	split up batch llm calls into separate runs (#5804 )	2023-06-24 21:03:31 -07:00
memory	feat: add Momento as a standard cache and chat message history provider (#5221 )	2023-05-25 19:13:21 -07:00
prompts	Cleanup integration test dir (#3308 )	2023-04-21 09:44:09 -07:00
retrievers	DocArray as a Retriever (#6031 )	2023-06-17 09:09:33 -07:00
utilities	Confluence added (#6432 )	2023-06-26 02:28:04 -07:00
vectorstores	Clarifai integration (#5954 )	2023-06-22 08:00:15 -07:00
__init__.py	initial commit	2022-10-24 14:51:15 -07:00
.env.example	adding MongoDBAtlasVectorSearch (#5338 )	2023-05-30 07:59:01 -07:00
conftest.py	feat: improve pinecone tests (#2806 )	2023-04-13 21:49:31 -07:00
test_document_transformers.py	Contextual compression retriever (#2915 )	2023-04-20 17:01:14 -07:00
test_kuzu.py	Add KuzuQAChain (#6454 )	2023-06-20 22:07:00 -07:00
test_nebulagraph.py	Harrison/nebula graph (#5865 )	2023-06-07 21:56:43 -07:00
test_nlp_text_splitters.py	OptimizedPrompt -- k-shot example choice backed by semantic search (#91 )	2022-11-09 21:15:42 -08:00
test_pdf_pagesplitter.py	cleanup: unify 3 different pdf loaders, rename PagedPDFSplitter (#1615 )	2023-03-13 23:06:50 -07:00
test_schema.py	Add 'get_token_ids' method (#4784 )	2023-05-22 13:17:26 +00:00
test_text_splitter.py	chore: spedd up integration test by using smaller model (#6044 )	2023-06-12 13:27:10 -07:00