This is a work in progress PR to track my progres. ## TODO: - [x] Get results using the specifed searx host - [x] Prioritize returning an `answer` or results otherwise - [ ] expose the field `infobox` when available - [ ] expose `score` of result to help agent's decision - [ ] expose the `suggestions` field to agents so they could try new queries if no results are found with the orignial query ? - [ ] Dynamic tool description for agents ? - Searx offers many engines and a search syntax that agents can take advantage of. It would be nice to generate a dynamic Tool description so that it can be used many times as a tool but for different purposes. - [x] Limit number of results - [ ] Implement paging - [x] Miror the usage of the Google Search tool - [x] easy selection of search engines - [x] Documentation - [ ] update HowTo guide notebook on Search Tools - [ ] Handle async - [ ] Tests ### Add examples / documentation on possible uses with - [ ] getting factual answers with `!wiki` option and `infoboxes` - [ ] getting `suggestions` - [ ] getting `corrections` --------- Co-authored-by: blob42 <spike@w530> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2.1 KiB
Key Concepts
Text Splitter
This class is responsible for splitting long pieces of text into smaller components. It contains different ways for splitting text (on characters, using Spacy, etc) as well as different ways for measuring length (token based, character based, etc).
Embeddings
These classes are very similar to the LLM classes in that they are wrappers around models, but rather than return a string they return an embedding (list of floats). These are particularly useful when implementing semantic search functionality. They expose separate methods for embedding queries versus embedding documents.
Vectorstores
These are datastores that store embeddings of documents in vector form. They expose a method for passing in a string and finding similar documents.
Python REPL
Sometimes, for complex calculations, rather than have an LLM generate the answer directly, it can be better to have the LLM generate code to calculate the answer, and then run that code to get the answer. In order to easily do that, we provide a simple Python REPL to execute commands in. This interface will only return things that are printed - therefore, if you want to use it to calculate an answer, make sure to have it print out the answer.
Bash
It can often be useful to have an LLM generate bash commands, and then run them. A common use case this is for letting it interact with your local file system. We provide an easy component to execute bash commands.
Requests Wrapper
The web contains a lot of information that LLMs do not have access to. In order to easily let LLMs interact with that information, we provide a wrapper around the Python Requests module that takes in a URL and fetches data from that URL.
Google Search
This uses the official Google Search API to look up information on the web.
SerpAPI
This uses SerpAPI, a third party search API engine, to interact with Google Search.
Searx Search
This uses the Searx (SearxNG fork) meta search engine API to lookup information on the web. It supports 139 search engines and is easy to self-host which makes it a good choice for privacy-conscious users.