* Add default mode option to settings
* Revise default_mode to Literal (enum) and add to settings.yaml
* Revise to pass make check/test
* Default mode: RAG
---------
Co-authored-by: Jason <jason@sowinsight.solutions>
* feat: add retry connection to ollama
When Ollama is running in the docker-compose, traefik is not ready sometimes to route the request, and it fails
* fix: mypy
* feat: change ollama default model to llama3.1
* chore: bump versions
* feat: Change default model in local mode to llama3.1
* chore: make sure last poetry version is used
* fix: mypy
* fix: do not add BOS (with last llamacpp-python version)
* docs: add troubleshooting
* fix: pass HF token to setup script and prevent to download tokenizer when it is empty
* fix: improve log and disable specific tokenizer by default
* chore: change HF_TOKEN environment to be aligned with default config
* ifx: mypy
* Support for Google Gemini LLMs and Embeddings
Initial support for Gemini, enables usage of Google LLMs and embedding models (see settings-gemini.yaml)
Install via
poetry install --extras "llms-gemini embeddings-gemini"
Notes:
* had to bump llama-index-core to later version that supports Gemini
* poetry --no-update did not work: Gemini/llama_index seem to require more (transient) updates to make it work...
* fix: crash when gemini is not selected
* docs: add gemini llm
---------
Co-authored-by: Javier Martinez <javiermartinezalvarez98@gmail.com>
* Updated prompt_style to be moved to the main LLM setting since all LLMs from llama_index can utilize this. I also included temperature, context window size, max_tokens, max_new_tokens into the openailike to help ensure the settings are consistent from the other implementations.
* Removed prompt_style from llamacpp entirely
* Fixed settings-local.yaml to include prompt_style in the LLM settings instead of llamacpp.
* Added RAG settings to settings.py, vector_store and chat_service to add similarity_top_k and similarity_score
* Updated settings in vector and chat service per Ivans request
* Updated code for mypy
* Adding Postgres for the doc and index store
* Adding documentation. Rename postgres database local->simple. Postgres storage dependencies
* Update documentation for postgres storage
* Renaming feature to nodestore
* update docstore -> nodestore in doc
* missed some docstore changes in doc
* Updated poetry.lock
* Formatting updates to pass ruff/black checks
* Correction to unreachable code!
* Format adjustment to pass black test
* Adjust extra inclusion name for vector pg
* extra dep change for pg vector
* storage-postgres -> storage-nodestore-postgres
* Hash change on poetry lock
* Update ui.py
Changed 'curated_sources' from a list, in order to maintain score order when returning the curated sources.
* Maintain score order after curating sources
* Extract optional dependencies
* Separate local mode into llms-llama-cpp and embeddings-huggingface for clarity
* Support Ollama embeddings
* Upgrade to llamaindex 0.10.14. Remove legacy use of ServiceContext in ContextChatEngine
* Fix vector retriever filters
Updated ui.py to include a small sleep timer while building the stream deltas. This recursive function fires off so quickly to eats up too much of the CPU. This small sleep frees up the CPU to not be bottlenecked. This value can go lower/shorter. But 0.02 or 0.025 seems to work well. (#1589)
Co-authored-by: root <root@wesgitlabdemo.icl.gtri.org>