feat(typescript)/dynamic template (#1287) (#1326)

* feat(typescript)/dynamic template (#1287) * remove packaged yarn * prompt templates update wip * prompt template update * system prompt template, update types, remove embed promises, cleanup * support both snakecased and camelcased prompt context * fix #1277 libbert, libfalcon and libreplit libs not being moved into the right folder after build * added support for modelConfigFile param, allowing the user to specify a local file instead of downloading the remote models.json. added a warning message if code fails to load a model config. included prompt context docs by amogus. * snakecase warning, put logic for loading local models.json into listModels, added constant for the default remote model list url, test improvements, simpler hasOwnProperty call * add DEFAULT_PROMPT_CONTEXT, export new constants * add md5sum testcase and fix constants export * update types * throw if attempting to list models without a source * rebuild docs * fix download logging undefined url, toFixed typo, pass config filesize in for future progress report * added overload with union types * bump to 2.2.0, remove alpha * code speling --------- Co-authored-by: Andreas Obersteiner <8959303+iimez@users.noreply.github.com>
2025-08-02 00:00:35 +00:00 · 2023-08-14 11:45:45 -05:00 · 2023-08-14 11:45:45 -05:00 · 4e55940edf
commit 4e55940edf
parent 4d855afe97
15 changed files with 5876 additions and 6938 deletions
--- a/gpt4all-bindings/python/docs/gpt4all_typescript.md
+++ b/gpt4all-bindings/python/docs/gpt4all_typescript.md
@ -1,7 +1,7 @@
 # GPT4All Node.js API
 ```sh
-yarn install gpt4all@alpha
+yarn add gpt4all@alpha
 npm install gpt4all@alpha
@ -10,34 +10,41 @@ pnpm install gpt4all@alpha
 The original [GPT4All typescript bindings](https://github.com/nomic-ai/gpt4all-ts) are now out of date.
-*   New bindings created by [jacoobes](https://github.com/jacoobes) and the [nomic ai community](https://home.nomic.ai) :D, for all to use.
+*   New bindings created by [jacoobes](https://github.com/jacoobes), [limez](https://github.com/iimez) and the [nomic ai community](https://home.nomic.ai), for all to use.
-*   [Documentation](#Documentation)
+*   The nodejs api has made strides to mirror the python api. It is not 100% mirrored, but many pieces of the api resemble its python counterpart.
 *   Everything should work out the box.
 *   See [API Reference](#api-reference)
-### Code (alpha)
+### Chat Completion (alpha)
 ```js
 import { createCompletion, loadModel } from '../src/gpt4all.js'
-const ll = await loadModel('ggml-vicuna-7b-1.1-q4_2.bin', { verbose: true });
+const model = await loadModel('ggml-vicuna-7b-1.1-q4_2', { verbose: true });
-const response = await createCompletion(ll, [
+const response = await createCompletion(model, [
    { role : 'system', content: 'You are meant to be annoying and unhelpful.'  },
    { role : 'user', content: 'What is 1 + 1?'  } 
 ]);
 ```
-### API
+### Embedding (alpha)
-*   The nodejs api has made strides to mirror the python api. It is not 100% mirrored, but many pieces of the api resemble its python counterpart.
+```js
-*   Everything should work out the box.
+import { createEmbedding, loadModel } from '../src/gpt4all.js'
-*   [docs](./docs/api.md)
+
 const model = await loadModel('ggml-all-MiniLM-L6-v2-f16', { verbose: true });
 const fltArray = createEmbedding(model, "Pain is inevitable, suffering optional");
 ```
 ### Build Instructions
-*   As of 05/21/2023, Tested on windows (MSVC). (somehow got it to work on MSVC 🤯)
+*   binding.gyp is compile config
    *   binding.gyp is compile config
 *   Tested on Ubuntu. Everything seems to work fine
 *   Tested on Windows. Everything works fine.
 *   Sparse testing on mac os.
 *   MingW works as well to build the gpt4all-backend. **HOWEVER**, this package works only with MSVC built dlls.
 ### Requirements
@ -48,11 +55,11 @@ const response = await createCompletion(ll, [
 *   [node-gyp](https://github.com/nodejs/node-gyp)
    *   all of its requirements.
 *   (unix) gcc version 12
    *   These bindings use the C++ 20 standard.
 *   (win) msvc version 143
    *   Can be obtained with visual studio 2022 build tools
 *   python 3
-### Build
+### Build (from source)
 ```sh
 git clone https://github.com/nomic-ai/gpt4all.git
@ -117,22 +124,27 @@ yarn test
 *   Handling prompting and inference of models in a threadsafe, asynchronous way.
-#### docs/
+### Known Issues
-*   Autogenerated documentation using the script `yarn docs:build`
+*   why your model may be spewing bull 💩
    *   The downloaded model is broken (just reinstall or download from official site)
    *   That's it so far
 ### Roadmap
 This package is in active development, and breaking changes may happen until the api stabilizes. Here's what's the todo list:
 *   \[x] prompt models via a threadsafe function in order to have proper non blocking behavior in nodejs
-*   \[ ] createTokenStream, an async iterator that streams each token emitted from the model. Planning on following this [example](https://github.com/nodejs/node-addon-examples/tree/main/threadsafe-async-iterator)
+*   \[ ] ~~createTokenStream, an async iterator that streams each token emitted from the model. Planning on following this [example](https://github.com/nodejs/node-addon-examples/tree/main/threadsafe-async-iterator)~~ May not implement unless someone else can complete
-*   \[ ] proper unit testing (integrate with circle ci)
+*   \[x] proper unit testing (integrate with circle ci)
-*   \[ ] publish to npm under alpha tag `gpt4all@alpha`
+*   \[x] publish to npm under alpha tag `gpt4all@alpha`
-*   \[ ] have more people test on other platforms (mac tester needed)
+*   \[x] have more people test on other platforms (mac tester needed)
 *   \[x] switch to new pluggable backend
 *   \[ ] NPM bundle size reduction via optionalDependencies strategy (need help)
    *   Should include prebuilds to avoid painful node-gyp errors
 *   \[ ] createChatSession ( the python equivalent to create\_chat\_session )
-### Documentation
+### API Reference
 <!-- Generated by documentation.js. Update this documentation by updating the source code. -->
@ -166,13 +178,14 @@ This package is in active development, and breaking changes may happen until the
    *   [Parameters](#parameters-5)
 *   [createCompletion](#createcompletion)
    *   [Parameters](#parameters-6)
    *   [Examples](#examples)
 *   [createEmbedding](#createembedding)
    *   [Parameters](#parameters-7)
 *   [CompletionOptions](#completionoptions)
    *   [verbose](#verbose)
-    *   [hasDefaultHeader](#hasdefaultheader)
+    *   [systemPromptTemplate](#systemprompttemplate)
-    *   [hasDefaultFooter](#hasdefaultfooter)
+    *   [promptTemplate](#prompttemplate)
    *   [promptHeader](#promptheader)
    *   [promptFooter](#promptfooter)
 *   [PromptMessage](#promptmessage)
    *   [role](#role)
    *   [content](#content)
@ -186,28 +199,31 @@ This package is in active development, and breaking changes may happen until the
 *   [CompletionChoice](#completionchoice)
    *   [message](#message)
 *   [LLModelPromptContext](#llmodelpromptcontext)
-    *   [logits\_size](#logits_size)
+    *   [logitsSize](#logitssize)
-    *   [tokens\_size](#tokens_size)
+    *   [tokensSize](#tokenssize)
-    *   [n\_past](#n_past)
+    *   [nPast](#npast)
-    *   [n\_ctx](#n_ctx)
+    *   [nCtx](#nctx)
-    *   [n\_predict](#n_predict)
+    *   [nPredict](#npredict)
-    *   [top\_k](#top_k)
+    *   [topK](#topk)
-    *   [top\_p](#top_p)
+    *   [topP](#topp)
    *   [temp](#temp)
-    *   [n\_batch](#n_batch)
+    *   [nBatch](#nbatch)
-    *   [repeat\_penalty](#repeat_penalty)
+    *   [repeatPenalty](#repeatpenalty)
-    *   [repeat\_last\_n](#repeat_last_n)
+    *   [repeatLastN](#repeatlastn)
-    *   [context\_erase](#context_erase)
+    *   [contextErase](#contexterase)
 *   [createTokenStream](#createtokenstream)
    *   [Parameters](#parameters-8)
 *   [DEFAULT\_DIRECTORY](#default_directory)
 *   [DEFAULT\_LIBRARIES\_DIRECTORY](#default_libraries_directory)
 *   [DEFAULT\_MODEL\_CONFIG](#default_model_config)
 *   [DEFAULT\_PROMT\_CONTEXT](#default_promt_context)
 *   [DEFAULT\_MODEL\_LIST\_URL](#default_model_list_url)
 *   [downloadModel](#downloadmodel)
    *   [Parameters](#parameters-9)
-    *   [Examples](#examples-1)
+    *   [Examples](#examples)
 *   [DownloadModelOptions](#downloadmodeloptions)
    *   [modelPath](#modelpath)
-    *   [debug](#debug)
+    *   [verbose](#verbose-1)
    *   [url](#url)
    *   [md5sum](#md5sum)
 *   [DownloadController](#downloadcontroller)
@ -223,6 +239,7 @@ Type: (`"gptj"` | `"llama"` | `"mpt"` | `"replit"`)
 #### ModelFile
 Full list of models available
@deprecated These model names are outdated and this type will not be maintained, please use a string literal instead
 ##### gptj
@ -367,7 +384,7 @@ By default this will download a model from the official GPT4ALL website, if a mo
 *   `modelName` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The name of the model to load.
 *   `options` **(LoadModelOptions | [undefined](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/undefined))?** (Optional) Additional options for loading the model.
-Returns **[Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)<[LLModel](#llmodel)>** A promise that resolves to an instance of the loaded LLModel.
+Returns **[Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)<(InferenceModel | EmbeddingModel)>** A promise that resolves to an instance of the loaded LLModel.
 #### createCompletion
@ -375,25 +392,10 @@ The nodejs equivalent to python binding's chat\_completion
 ##### Parameters
-*   `llmodel` **[LLModel](#llmodel)** The language model object.
+*   `model` **InferenceModel** The language model object.
 *   `messages` **[Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)<[PromptMessage](#promptmessage)>** The array of messages for the conversation.
 *   `options` **[CompletionOptions](#completionoptions)** The options for creating the completion.
 ##### Examples
 ```javascript
 const llmodel = new LLModel(model)
 const messages = [
 { role: 'system', message: 'You are a weather forecaster.' },
 { role: 'user', message: 'should i go out today?' } ]
 const completion = await createCompletion(llmodel, messages, {
 verbose: true,
 temp: 0.9,
 })
 console.log(completion.choices[0].message.content)
 // No, it's going to be cold and rainy.
 ```
 Returns **[CompletionReturn](#completionreturn)** The completion result.
 #### createEmbedding
@ -403,7 +405,7 @@ meow
 ##### Parameters
-*   `llmodel` **[LLModel](#llmodel)** The language model object.
+*   `model` **EmbeddingModel** The language model object.
 *   `text` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** text to embed
 Returns **[Float32Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Float32Array)** The completion result.
@ -420,17 +422,30 @@ Indicates if verbose logging is enabled.
 Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
-##### hasDefaultHeader
+##### systemPromptTemplate
-Indicates if the default header is included in the prompt.
+Template for the system message. Will be put before the conversation with %1 being replaced by all system messages.
 Note that if this is not defined, system messages will not be included in the prompt.
 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
 ##### promptTemplate
 Template for user messages, with %1 being replaced by the message.
 Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
-##### hasDefaultFooter
+##### promptHeader
-Indicates if the default footer is included in the prompt.
+The initial instruction for the model, on top of the prompt
-Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
 ##### promptFooter
 The last instruction for the model, appended to the end of the prompt.
 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
 #### PromptMessage
@ -472,9 +487,9 @@ The result of the completion, similar to OpenAI's format.
 ##### model
-The model name.
+The model used for the completion.
-Type: [ModelFile](#modelfile)
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
 ##### usage
@ -502,73 +517,100 @@ Type: [PromptMessage](#promptmessage)
 Model inference arguments for generating completions.
-##### logits\_size
+##### logitsSize
 The size of the raw logits vector.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
-##### tokens\_size
+##### tokensSize
 The size of the raw tokens vector.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
-##### n\_past
+##### nPast
 The number of tokens in the past conversation.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
-##### n\_ctx
+##### nCtx
 The number of tokens possible in the context window.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
-##### n\_predict
+##### nPredict
 The number of tokens to predict.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
-##### top\_k
+##### topK
 The top-k logits to sample from.
 Top-K sampling selects the next token only from the top K most likely tokens predicted by the model.
 It helps reduce the risk of generating low-probability or nonsensical tokens, but it may also limit
 the diversity of the output. A higher value for top-K (eg., 100) will consider more tokens and lead
 to more diverse text, while a lower value (eg., 10) will focus on the most probable tokens and generate
 more conservative text. 30 - 60 is a good range for most tasks.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
-##### top\_p
+##### topP
 The nucleus sampling probability threshold.
 Top-P limits the selection of the next token to a subset of tokens with a cumulative probability
 above a threshold P. This method, also known as nucleus sampling, finds a balance between diversity
 and quality by considering both token probabilities and the number of tokens available for sampling.
 When using a higher value for top-P (eg., 0.95), the generated text becomes more diverse.
 On the other hand, a lower value (eg., 0.1) produces more focused and conservative text.
 The default value is 0.4, which is aimed to be the middle ground between focus and diversity, but
 for more creative tasks a higher top-p value will be beneficial, about 0.5-0.9 is a good range for that.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
 ##### temp
 The temperature to adjust the model's output distribution.
 Temperature is like a knob that adjusts how creative or focused the output becomes. Higher temperatures
 (eg., 1.2) increase randomness, resulting in more imaginative and diverse text. Lower temperatures (eg., 0.5)
 make the output more focused, predictable, and conservative. When the temperature is set to 0, the output
 becomes completely deterministic, always selecting the most probable next token and producing identical results
 each time. A safe range would be around 0.6 - 0.85, but you are free to search what value fits best for you.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
-##### n\_batch
+##### nBatch
 The number of predictions to generate in parallel.
 By splitting the prompt every N tokens, prompt-batch-size reduces RAM usage during processing. However,
 this can increase the processing time as a trade-off. If the N value is set too low (e.g., 10), long prompts
 with 500+ tokens will be most affected, requiring numerous processing runs to complete the prompt processing.
 To ensure optimal performance, setting the prompt-batch-size to 2048 allows processing of all tokens in a single run.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
-##### repeat\_penalty
+##### repeatPenalty
 The penalty factor for repeated tokens.
 Repeat-penalty can help penalize tokens based on how frequently they occur in the text, including the input prompt.
 A token that has already appeared five times is penalized more heavily than a token that has appeared only one time.
 A value of 1 means that there is no penalty and values larger than 1 discourage repeated tokens.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
-##### repeat\_last\_n
+##### repeatLastN
 The number of last tokens to penalize.
 The repeat-penalty-tokens N option controls the number of tokens in the history to consider for penalizing repetition.
 A larger value will look further back in the generated text to prevent repetitions, while a smaller value will only
 consider recent tokens.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
-##### context\_erase
+##### contextErase
 The percentage of context to erase if the context window is exceeded.
@ -602,21 +644,39 @@ This searches DEFAULT\_DIRECTORY/libraries, cwd/libraries, and finally cwd.
 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
 #### DEFAULT\_MODEL\_CONFIG
 Default model configuration.
 Type: ModelConfig
 #### DEFAULT\_PROMT\_CONTEXT
 Default prompt context.
 Type: [LLModelPromptContext](#llmodelpromptcontext)
 #### DEFAULT\_MODEL\_LIST\_URL
 Default model list url.
 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
 #### downloadModel
-Initiates the download of a model file of a specific model type.
+Initiates the download of a model file.
 By default this downloads without waiting. use the controller returned to alter this behavior.
 ##### Parameters
-*   `modelName` **[ModelFile](#modelfile)** The model file to be downloaded.
+*   `modelName` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The model to be downloaded.
-*   `options` **DownloadOptions** to pass into the downloader. Default is { location: (cwd), debug: false }.
+*   `options` **DownloadOptions** to pass into the downloader. Default is { location: (cwd), verbose: false }.
 ##### Examples
 ```javascript
-const controller = download('ggml-gpt4all-j-v1.3-groovy.bin')
+const download = downloadModel('ggml-gpt4all-j-v1.3-groovy.bin')
-controller.promise().then(() => console.log('Downloaded!'))
+download.promise.then(() => console.log('Downloaded!'))
 ```
 *   Throws **[Error](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Error)** If the model already exists in the specified location.
@ -635,7 +695,7 @@ Default is process.cwd(), or the current working directory
 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
-##### debug
+##### verbose
 Debug mode -- check how long it took to download in seconds
@ -643,15 +703,16 @@ Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Glob
 ##### url
-Remote download url. Defaults to `https://gpt4all.io/models`
+Remote download url. Defaults to `https://gpt4all.io/models/<modelName>`
 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
 ##### md5sum
-Whether to verify the hash of the download to ensure a proper download occurred.
+MD5 sum of the model file. If this is provided, the downloaded file will be checked against this sum.
 If the sums do not match, an error will be thrown and the file will be deleted.
-Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
+Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
 #### DownloadController
@ -659,12 +720,12 @@ Model download controller.
 ##### cancel
-Cancel the request to download from gpt4all website if this is called.
+Cancel the request to download if this is called.
 Type: function (): void
 ##### promise
-Convert the downloader into a promise, allowing people to await and manage its lifetime
+A promise resolving to the downloaded models config once the download is done
-Type: function (): [Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)\<void>
+Type: [Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)\<ModelConfig>
--- a/gpt4all-bindings/typescript/.yarn/releases/yarn-3.6.1.cjs
+++ b/gpt4all-bindings/typescript/.yarn/releases/yarn-3.6.1.cjs
--- a/gpt4all-bindings/typescript/.yarnrc.yml
+++ b/gpt4all-bindings/typescript/.yarnrc.yml
@ -1 +0,0 @@
 yarnPath: .yarn/releases/yarn-3.6.1.cjs
--- a/gpt4all-bindings/typescript/README.md
+++ b/gpt4all-bindings/typescript/README.md
@ -11,36 +11,34 @@ pnpm install gpt4all@alpha
 The original [GPT4All typescript bindings](https://github.com/nomic-ai/gpt4all-ts) are now out of date.
 *   New bindings created by [jacoobes](https://github.com/jacoobes), [limez](https://github.com/iimez) and the [nomic ai community](https://home.nomic.ai), for all to use.
-*   [Documentation](#Documentation)
+*   The nodejs api has made strides to mirror the python api. It is not 100% mirrored, but many pieces of the api resemble its python counterpart.
 *   Everything should work out the box.
 *   See [API Reference](#api-reference)
-### Chat Completion (alpha)
+### Chat Completion
 ```js
 import { createCompletion, loadModel } from '../src/gpt4all.js'
-const ll = await loadModel('ggml-vicuna-7b-1.1-q4_2', { verbose: true });
+const model = await loadModel('ggml-vicuna-7b-1.1-q4_2', { verbose: true });
-const response = await createCompletion(ll, [
+const response = await createCompletion(model, [
    { role : 'system', content: 'You are meant to be annoying and unhelpful.'  },
    { role : 'user', content: 'What is 1 + 1?'  } 
 ]);
 ```
-### Embedding (alpha)
+
 ### Embedding
 ```js
 import { createEmbedding, loadModel } from '../src/gpt4all.js'
-const ll = await loadModel('ggml-all-MiniLM-L6-v2-f16', { verbose: true });
+const model = await loadModel('ggml-all-MiniLM-L6-v2-f16', { verbose: true });
-const fltArray = createEmbedding(ll, "Pain is inevitable, suffering optional");
+const fltArray = createEmbedding(model, "Pain is inevitable, suffering optional");
 ```
 ### API
 *   The nodejs api has made strides to mirror the python api. It is not 100% mirrored, but many pieces of the api resemble its python counterpart.
 *   Everything should work out the box.
 *   [docs](./docs/api.md)
 ### Build Instructions
 *   binding.gyp is compile config
@ -60,6 +58,7 @@ const fltArray = createEmbedding(ll, "Pain is inevitable, suffering optional");
 *   (win) msvc version 143
    *   Can be obtained with visual studio 2022 build tools
 *   python 3
 ### Build (from source)
 ```sh
@ -125,15 +124,12 @@ yarn test
 *   Handling prompting and inference of models in a threadsafe, asynchronous way.
 #### docs/
 *   Autogenerated documentation using the script `yarn docs:build`
 ### Known Issues
-    * why your model may be spewing bull 💩 
+*   why your model may be spewing bull 💩
-        - The downloaded model is broken (just reinstall or download from official site)
+    *   The downloaded model is broken (just reinstall or download from official site)
-        - That's it so far
+    *   That's it so far
 ### Roadmap
 This package is in active development, and breaking changes may happen until the api stabilizes. Here's what's the todo list:
@ -144,7 +140,592 @@ This package is in active development, and breaking changes may happen until the
 *   \[x] publish to npm under alpha tag `gpt4all@alpha`
 *   \[x] have more people test on other platforms (mac tester needed)
 *   \[x] switch to new pluggable backend
-*   \[ ] NPM bundle size reduction via optionalDependencies strategy (need help) 
+*   \[ ] NPM bundle size reduction via optionalDependencies strategy (need help)
-    - Should include prebuilds to avoid painful node-gyp errors
+    *   Should include prebuilds to avoid painful node-gyp errors
 *   \[ ] createChatSession ( the python equivalent to create\_chat\_session )
-### Documentation
+
 ### API Reference
 <!-- Generated by documentation.js. Update this documentation by updating the source code. -->
 ##### Table of Contents
 *   [ModelType](#modeltype)
 *   [ModelFile](#modelfile)
    *   [gptj](#gptj)
    *   [llama](#llama)
    *   [mpt](#mpt)
    *   [replit](#replit)
 *   [type](#type)
 *   [LLModel](#llmodel)
    *   [constructor](#constructor)
        *   [Parameters](#parameters)
    *   [type](#type-1)
    *   [name](#name)
    *   [stateSize](#statesize)
    *   [threadCount](#threadcount)
    *   [setThreadCount](#setthreadcount)
        *   [Parameters](#parameters-1)
    *   [raw\_prompt](#raw_prompt)
        *   [Parameters](#parameters-2)
    *   [embed](#embed)
        *   [Parameters](#parameters-3)
    *   [isModelLoaded](#ismodelloaded)
    *   [setLibraryPath](#setlibrarypath)
        *   [Parameters](#parameters-4)
    *   [getLibraryPath](#getlibrarypath)
 *   [loadModel](#loadmodel)
    *   [Parameters](#parameters-5)
 *   [createCompletion](#createcompletion)
    *   [Parameters](#parameters-6)
 *   [createEmbedding](#createembedding)
    *   [Parameters](#parameters-7)
 *   [CompletionOptions](#completionoptions)
    *   [verbose](#verbose)
    *   [systemPromptTemplate](#systemprompttemplate)
    *   [promptTemplate](#prompttemplate)
    *   [promptHeader](#promptheader)
    *   [promptFooter](#promptfooter)
 *   [PromptMessage](#promptmessage)
    *   [role](#role)
    *   [content](#content)
 *   [prompt\_tokens](#prompt_tokens)
 *   [completion\_tokens](#completion_tokens)
 *   [total\_tokens](#total_tokens)
 *   [CompletionReturn](#completionreturn)
    *   [model](#model)
    *   [usage](#usage)
    *   [choices](#choices)
 *   [CompletionChoice](#completionchoice)
    *   [message](#message)
 *   [LLModelPromptContext](#llmodelpromptcontext)
    *   [logitsSize](#logitssize)
    *   [tokensSize](#tokenssize)
    *   [nPast](#npast)
    *   [nCtx](#nctx)
    *   [nPredict](#npredict)
    *   [topK](#topk)
    *   [topP](#topp)
    *   [temp](#temp)
    *   [nBatch](#nbatch)
    *   [repeatPenalty](#repeatpenalty)
    *   [repeatLastN](#repeatlastn)
    *   [contextErase](#contexterase)
 *   [createTokenStream](#createtokenstream)
    *   [Parameters](#parameters-8)
 *   [DEFAULT\_DIRECTORY](#default_directory)
 *   [DEFAULT\_LIBRARIES\_DIRECTORY](#default_libraries_directory)
 *   [DEFAULT\_MODEL\_CONFIG](#default_model_config)
 *   [DEFAULT\_PROMT\_CONTEXT](#default_promt_context)
 *   [DEFAULT\_MODEL\_LIST\_URL](#default_model_list_url)
 *   [downloadModel](#downloadmodel)
    *   [Parameters](#parameters-9)
    *   [Examples](#examples)
 *   [DownloadModelOptions](#downloadmodeloptions)
    *   [modelPath](#modelpath)
    *   [verbose](#verbose-1)
    *   [url](#url)
    *   [md5sum](#md5sum)
 *   [DownloadController](#downloadcontroller)
    *   [cancel](#cancel)
    *   [promise](#promise)
 #### ModelType
 Type of the model
 Type: (`"gptj"` | `"llama"` | `"mpt"` | `"replit"`)
 #### ModelFile
 Full list of models available
@deprecated These model names are outdated and this type will not be maintained, please use a string literal instead
 ##### gptj
 List of GPT-J Models
 Type: (`"ggml-gpt4all-j-v1.3-groovy.bin"` | `"ggml-gpt4all-j-v1.2-jazzy.bin"` | `"ggml-gpt4all-j-v1.1-breezy.bin"` | `"ggml-gpt4all-j.bin"`)
 ##### llama
 List Llama Models
 Type: (`"ggml-gpt4all-l13b-snoozy.bin"` | `"ggml-vicuna-7b-1.1-q4_2.bin"` | `"ggml-vicuna-13b-1.1-q4_2.bin"` | `"ggml-wizardLM-7B.q4_2.bin"` | `"ggml-stable-vicuna-13B.q4_2.bin"` | `"ggml-nous-gpt4-vicuna-13b.bin"` | `"ggml-v3-13b-hermes-q5_1.bin"`)
 ##### mpt
 List of MPT Models
 Type: (`"ggml-mpt-7b-base.bin"` | `"ggml-mpt-7b-chat.bin"` | `"ggml-mpt-7b-instruct.bin"`)
 ##### replit
 List of Replit Models
 Type: `"ggml-replit-code-v1-3b.bin"`
 #### type
 Model architecture. This argument currently does not have any functionality and is just used as descriptive identifier for user.
 Type: [ModelType](#modeltype)
 #### LLModel
 LLModel class representing a language model.
 This is a base class that provides common functionality for different types of language models.
 ##### constructor
 Initialize a new LLModel.
 ###### Parameters
 *   `path` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** Absolute path to the model file.
 <!---->
 *   Throws **[Error](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Error)** If the model file does not exist.
 ##### type
 either 'gpt', mpt', or 'llama' or undefined
 Returns **([ModelType](#modeltype) | [undefined](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/undefined))**&#x20;
 ##### name
 The name of the model.
 Returns **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)**&#x20;
 ##### stateSize
 Get the size of the internal state of the model.
 NOTE: This state data is specific to the type of model you have created.
 Returns **[number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)** the size in bytes of the internal state of the model
 ##### threadCount
 Get the number of threads used for model inference.
 The default is the number of physical cores your computer has.
 Returns **[number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)** The number of threads used for model inference.
 ##### setThreadCount
 Set the number of threads used for model inference.
 ###### Parameters
 *   `newNumber` **[number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)** The new number of threads.
 Returns **void**&#x20;
 ##### raw\_prompt
 Prompt the model with a given input and optional parameters.
 This is the raw output from model.
 Use the prompt function exported for a value
 ###### Parameters
 *   `q` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The prompt input.
 *   `params` **Partial<[LLModelPromptContext](#llmodelpromptcontext)>** Optional parameters for the prompt context.
 *   `callback` **function (res: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)): void**&#x20;
 Returns **void** The result of the model prompt.
 ##### embed
 Embed text with the model. Keep in mind that
 not all models can embed text, (only bert can embed as of 07/16/2023 (mm/dd/yyyy))
 Use the prompt function exported for a value
 ###### Parameters
 *   `text` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)**&#x20;
 *   `q`  The prompt input.
 *   `params`  Optional parameters for the prompt context.
 Returns **[Float32Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Float32Array)** The result of the model prompt.
 ##### isModelLoaded
 Whether the model is loaded or not.
 Returns **[boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)**&#x20;
 ##### setLibraryPath
 Where to search for the pluggable backend libraries
 ###### Parameters
 *   `s` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)**&#x20;
 Returns **void**&#x20;
 ##### getLibraryPath
 Where to get the pluggable backend libraries
 Returns **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)**&#x20;
 #### loadModel
 Loads a machine learning model with the specified name. The defacto way to create a model.
 By default this will download a model from the official GPT4ALL website, if a model is not present at given path.
 ##### Parameters
 *   `modelName` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The name of the model to load.
 *   `options` **(LoadModelOptions | [undefined](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/undefined))?** (Optional) Additional options for loading the model.
 Returns **[Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)<(InferenceModel | EmbeddingModel)>** A promise that resolves to an instance of the loaded LLModel.
 #### createCompletion
 The nodejs equivalent to python binding's chat\_completion
 ##### Parameters
 *   `model` **InferenceModel** The language model object.
 *   `messages` **[Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)<[PromptMessage](#promptmessage)>** The array of messages for the conversation.
 *   `options` **[CompletionOptions](#completionoptions)** The options for creating the completion.
 Returns **[CompletionReturn](#completionreturn)** The completion result.
 #### createEmbedding
 The nodejs moral equivalent to python binding's Embed4All().embed()
 meow
 ##### Parameters
 *   `model` **EmbeddingModel** The language model object.
 *   `text` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** text to embed
 Returns **[Float32Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Float32Array)** The completion result.
 #### CompletionOptions
 **Extends Partial\<LLModelPromptContext>**
 The options for creating the completion.
 ##### verbose
 Indicates if verbose logging is enabled.
 Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
 ##### systemPromptTemplate
 Template for the system message. Will be put before the conversation with %1 being replaced by all system messages.
 Note that if this is not defined, system messages will not be included in the prompt.
 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
 ##### promptTemplate
 Template for user messages, with %1 being replaced by the message.
 Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
 ##### promptHeader
 The initial instruction for the model, on top of the prompt
 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
 ##### promptFooter
 The last instruction for the model, appended to the end of the prompt.
 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
 #### PromptMessage
 A message in the conversation, identical to OpenAI's chat message.
 ##### role
 The role of the message.
 Type: (`"system"` | `"assistant"` | `"user"`)
 ##### content
 The message content.
 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
 #### prompt\_tokens
 The number of tokens used in the prompt.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
 #### completion\_tokens
 The number of tokens used in the completion.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
 #### total\_tokens
 The total number of tokens used.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
 #### CompletionReturn
 The result of the completion, similar to OpenAI's format.
 ##### model
 The model used for the completion.
 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
 ##### usage
 Token usage report.
 Type: {prompt\_tokens: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number), completion\_tokens: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number), total\_tokens: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)}
 ##### choices
 The generated completions.
 Type: [Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)<[CompletionChoice](#completionchoice)>
 #### CompletionChoice
 A completion choice, similar to OpenAI's format.
 ##### message
 Response message
 Type: [PromptMessage](#promptmessage)
 #### LLModelPromptContext
 Model inference arguments for generating completions.
 ##### logitsSize
 The size of the raw logits vector.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
 ##### tokensSize
 The size of the raw tokens vector.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
 ##### nPast
 The number of tokens in the past conversation.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
 ##### nCtx
 The number of tokens possible in the context window.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
 ##### nPredict
 The number of tokens to predict.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
 ##### topK
 The top-k logits to sample from.
 Top-K sampling selects the next token only from the top K most likely tokens predicted by the model.
 It helps reduce the risk of generating low-probability or nonsensical tokens, but it may also limit
 the diversity of the output. A higher value for top-K (eg., 100) will consider more tokens and lead
 to more diverse text, while a lower value (eg., 10) will focus on the most probable tokens and generate
 more conservative text. 30 - 60 is a good range for most tasks.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
 ##### topP
 The nucleus sampling probability threshold.
 Top-P limits the selection of the next token to a subset of tokens with a cumulative probability
 above a threshold P. This method, also known as nucleus sampling, finds a balance between diversity
 and quality by considering both token probabilities and the number of tokens available for sampling.
 When using a higher value for top-P (eg., 0.95), the generated text becomes more diverse.
 On the other hand, a lower value (eg., 0.1) produces more focused and conservative text.
 The default value is 0.4, which is aimed to be the middle ground between focus and diversity, but
 for more creative tasks a higher top-p value will be beneficial, about 0.5-0.9 is a good range for that.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
 ##### temp
 The temperature to adjust the model's output distribution.
 Temperature is like a knob that adjusts how creative or focused the output becomes. Higher temperatures
 (eg., 1.2) increase randomness, resulting in more imaginative and diverse text. Lower temperatures (eg., 0.5)
 make the output more focused, predictable, and conservative. When the temperature is set to 0, the output
 becomes completely deterministic, always selecting the most probable next token and producing identical results
 each time. A safe range would be around 0.6 - 0.85, but you are free to search what value fits best for you.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
 ##### nBatch
 The number of predictions to generate in parallel.
 By splitting the prompt every N tokens, prompt-batch-size reduces RAM usage during processing. However,
 this can increase the processing time as a trade-off. If the N value is set too low (e.g., 10), long prompts
 with 500+ tokens will be most affected, requiring numerous processing runs to complete the prompt processing.
 To ensure optimal performance, setting the prompt-batch-size to 2048 allows processing of all tokens in a single run.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
 ##### repeatPenalty
 The penalty factor for repeated tokens.
 Repeat-penalty can help penalize tokens based on how frequently they occur in the text, including the input prompt.
 A token that has already appeared five times is penalized more heavily than a token that has appeared only one time.
 A value of 1 means that there is no penalty and values larger than 1 discourage repeated tokens.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
 ##### repeatLastN
 The number of last tokens to penalize.
 The repeat-penalty-tokens N option controls the number of tokens in the history to consider for penalizing repetition.
 A larger value will look further back in the generated text to prevent repetitions, while a smaller value will only
 consider recent tokens.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
 ##### contextErase
 The percentage of context to erase if the context window is exceeded.
 Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
 #### createTokenStream
 TODO: Help wanted to implement this
 ##### Parameters
 *   `llmodel` **[LLModel](#llmodel)**&#x20;
 *   `messages` **[Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)<[PromptMessage](#promptmessage)>**&#x20;
 *   `options` **[CompletionOptions](#completionoptions)**&#x20;
 Returns **function (ll: [LLModel](#llmodel)): AsyncGenerator<[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)>**&#x20;
 #### DEFAULT\_DIRECTORY
 From python api:
 models will be stored in (homedir)/.cache/gpt4all/\`
 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
 #### DEFAULT\_LIBRARIES\_DIRECTORY
 From python api:
 The default path for dynamic libraries to be stored.
 You may separate paths by a semicolon to search in multiple areas.
 This searches DEFAULT\_DIRECTORY/libraries, cwd/libraries, and finally cwd.
 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
 #### DEFAULT\_MODEL\_CONFIG
 Default model configuration.
 Type: ModelConfig
 #### DEFAULT\_PROMT\_CONTEXT
 Default prompt context.
 Type: [LLModelPromptContext](#llmodelpromptcontext)
 #### DEFAULT\_MODEL\_LIST\_URL
 Default model list url.
 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
 #### downloadModel
 Initiates the download of a model file.
 By default this downloads without waiting. use the controller returned to alter this behavior.
 ##### Parameters
 *   `modelName` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The model to be downloaded.
 *   `options` **DownloadOptions** to pass into the downloader. Default is { location: (cwd), verbose: false }.
 ##### Examples
 ```javascript
 const download = downloadModel('ggml-gpt4all-j-v1.3-groovy.bin')
 download.promise.then(() => console.log('Downloaded!'))
 ```
 *   Throws **[Error](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Error)** If the model already exists in the specified location.
 *   Throws **[Error](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Error)** If the model cannot be found at the specified url.
 Returns **[DownloadController](#downloadcontroller)** object that allows controlling the download process.
 #### DownloadModelOptions
 Options for the model download process.
 ##### modelPath
 location to download the model.
 Default is process.cwd(), or the current working directory
 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
 ##### verbose
 Debug mode -- check how long it took to download in seconds
 Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
 ##### url
 Remote download url. Defaults to `https://gpt4all.io/models/<modelName>`
 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
 ##### md5sum
 MD5 sum of the model file. If this is provided, the downloaded file will be checked against this sum.
 If the sums do not match, an error will be thrown and the file will be deleted.
 Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
 #### DownloadController
 Model download controller.
 ##### cancel
 Cancel the request to download if this is called.
 Type: function (): void
 ##### promise
 A promise resolving to the downloaded models config once the download is done
 Type: [Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)\<ModelConfig>
--- a/gpt4all-bindings/typescript/package.json
+++ b/gpt4all-bindings/typescript/package.json
@ -1,6 +1,6 @@
 {
  "name": "gpt4all",
-  "version": "2.1.1-alpha",
+  "version": "2.2.0",
  "packageManager": "yarn@3.6.1",
  "main": "src/gpt4all.js",
  "repository": "nomic-ai/gpt4all",
@ -10,8 +10,8 @@
    "build:backend": "node scripts/build.js",
    "build": "node-gyp-build",
    "predocs:build": "node scripts/docs.js",
-    "docs:build": "documentation readme ./src/gpt4all.d.ts --parse-extension js d.ts --format md --section documentation --readme-file ../python/docs/gpt4all_typescript.md",
+    "docs:build": "documentation readme ./src/gpt4all.d.ts --parse-extension js d.ts --format md --section \"API Reference\" --readme-file ../python/docs/gpt4all_typescript.md",
-    "postdocs:build": "node scripts/docs.js"
+    "postdocs:build": "documentation readme ./src/gpt4all.d.ts --parse-extension js d.ts --format md --section \"API Reference\" --readme-file README.md"
  },
  "files": [
    "src/**/*",
--- a/gpt4all-bindings/typescript/scripts/build_unix.sh
+++ b/gpt4all-bindings/typescript/scripts/build_unix.sh
@ -24,7 +24,9 @@ mkdir -p "$NATIVE_DIR" "$BUILD_DIR"
 cmake -S ../../gpt4all-backend -B "$BUILD_DIR" &&
 cmake --build "$BUILD_DIR" -j --config Release && {
-  cp "$BUILD_DIR"/libllmodel.$LIB_EXT "$NATIVE_DIR"/
+  cp "$BUILD_DIR"/libbert*.$LIB_EXT   "$NATIVE_DIR"/
  cp "$BUILD_DIR"/libfalcon*.$LIB_EXT "$NATIVE_DIR"/
  cp "$BUILD_DIR"/libreplit*.$LIB_EXT "$NATIVE_DIR"/
  cp "$BUILD_DIR"/libgptj*.$LIB_EXT   "$NATIVE_DIR"/
  cp "$BUILD_DIR"/libllama*.$LIB_EXT  "$NATIVE_DIR"/
  cp "$BUILD_DIR"/libmpt*.$LIB_EXT    "$NATIVE_DIR"/
--- a/gpt4all-bindings/typescript/spec/chat.mjs
+++ b/gpt4all-bindings/typescript/spec/chat.mjs
@ -1,9 +1,10 @@
 import { LLModel, createCompletion, DEFAULT_DIRECTORY, DEFAULT_LIBRARIES_DIRECTORY, loadModel } from '../src/gpt4all.js'
-const ll = await loadModel(
+const model = await loadModel(
    'orca-mini-3b.ggmlv3.q4_0.bin',
    { verbose: true }
 );
 const ll = model.llm;
 try {
   class Extended extends LLModel {
@ -26,13 +27,13 @@ console.log("type: " + ll.type());
 console.log("Default directory for models", DEFAULT_DIRECTORY);
 console.log("Default directory for libraries", DEFAULT_LIBRARIES_DIRECTORY);
-const completion1 = await createCompletion(ll, [ 
+const completion1 = await createCompletion(model, [ 
    { role : 'system', content: 'You are an advanced mathematician.'  },
    { role : 'user', content: 'What is 1 + 1?'  }, 
 ], { verbose: true })
 console.log(completion1.choices[0].message)
-const completion2 = await createCompletion(ll, [
+const completion2 = await createCompletion(model, [
    { role : 'system', content: 'You are an advanced mathematician.'  },
    { role : 'user', content: 'What is two plus two?'  }, 
 ], {  verbose: true })
--- a/gpt4all-bindings/typescript/src/config.js
+++ b/gpt4all-bindings/typescript/src/config.js
@ -16,7 +16,26 @@ const librarySearchPaths = [
 const DEFAULT_LIBRARIES_DIRECTORY = librarySearchPaths.join(";");
 const DEFAULT_MODEL_CONFIG = {
    systemPrompt: "",
    promptTemplate: "### Human: \n%1\n### Assistant:\n",
 }
 const DEFAULT_MODEL_LIST_URL = "https://gpt4all.io/models/models.json";
 const DEFAULT_PROMPT_CONTEXT = {
    temp: 0.7,
    topK: 40,
    topP: 0.4,
    repeatPenalty: 1.18,
    repeatLastN: 64,
    nBatch: 8,
 }
 module.exports = {
    DEFAULT_DIRECTORY,
    DEFAULT_LIBRARIES_DIRECTORY,
    DEFAULT_MODEL_CONFIG,
    DEFAULT_MODEL_LIST_URL,
    DEFAULT_PROMPT_CONTEXT,
 };
--- a/gpt4all-bindings/typescript/src/gpt4all.d.ts
+++ b/gpt4all-bindings/typescript/src/gpt4all.d.ts
@ -1,12 +1,13 @@
 /// <reference types="node" />
 declare module "gpt4all";
 /** Type of the model */
 type ModelType = "gptj" | "llama" | "mpt" | "replit";
 // NOTE: "deprecated" tag in below comment breaks the doc generator https://github.com/documentationjs/documentation/issues/1596
 /**
 * Full list of models available
 * @deprecated These model names are outdated and this type will not be maintained, please use a string literal instead
 */
 interface ModelFile {
    /** List of GPT-J Models */
@ -39,10 +40,37 @@ interface LLModelOptions {
     * Model architecture. This argument currently does not have any functionality and is just used as descriptive identifier for user.
     */
    type?: ModelType;
-    model_name: ModelFile[ModelType];
+    model_name: string;
    model_path: string;
    library_path?: string;
 }
 interface ModelConfig {
    systemPrompt: string;
    promptTemplate: string;
    path: string;
    url?: string;
 }
 declare class InferenceModel {
    constructor(llm: LLModel, config: ModelConfig);
    llm: LLModel;
    config: ModelConfig;
    generate(
        prompt: string,
        options?: Partial<LLModelPromptContext>
    ): Promise<string>;
 }
 declare class EmbeddingModel {
    constructor(llm: LLModel, config: ModelConfig);
    llm: LLModel;
    config: ModelConfig;
    embed(text: string): Float32Array;
 }
 /**
 * LLModel class representing a language model.
 * This is a base class that provides common functionality for different types of language models.
@ -90,17 +118,21 @@ declare class LLModel {
     * @param params Optional parameters for the prompt context.
     * @returns The result of the model prompt.
     */
-    raw_prompt(q: string, params: Partial<LLModelPromptContext>, callback: (res: string) => void): void; // TODO work on return type
+    raw_prompt(
        q: string,
        params: Partial<LLModelPromptContext>,
        callback: (res: string) => void
    ): void; // TODO work on return type
    /**
-     * Embed text with the model. Keep in mind that 
+     * Embed text with the model. Keep in mind that
     * not all models can embed text, (only bert can embed as of 07/16/2023 (mm/dd/yyyy))
     * Use the prompt function exported for a value
     * @param q The prompt input.
     * @param params Optional parameters for the prompt context.
     * @returns The result of the model prompt.
     */
-    embed(text: string) : Float32Array
+    embed(text: string): Float32Array;
    /**
     * Whether the model is loaded or not.
     */
@ -119,60 +151,66 @@ declare class LLModel {
 interface LoadModelOptions {
    modelPath?: string;
    librariesPath?: string;
    modelConfigFile?: string;
    allowDownload?: boolean;
    verbose?: boolean;
 }
 interface InferenceModelOptions extends LoadModelOptions {
    type?: "inference";
 }
 interface EmbeddingModelOptions extends LoadModelOptions {
    type: "embedding";
 }
 /**
 * Loads a machine learning model with the specified name. The defacto way to create a model.
 * By default this will download a model from the official GPT4ALL website, if a model is not present at given path.
 *
 * @param {string} modelName - The name of the model to load.
 * @param {LoadModelOptions|undefined} [options] - (Optional) Additional options for loading the model.
- * @returns {Promise<LLModel>} A promise that resolves to an instance of the loaded LLModel.
+ * @returns {Promise<InferenceModel | EmbeddingModel>} A promise that resolves to an instance of the loaded LLModel.
 */
 declare function loadModel(
    modelName: string,
-    options?: LoadModelOptions
+    options?: InferenceModelOptions
-): Promise<LLModel>;
+): Promise<InferenceModel>;
 declare function loadModel(
    modelName: string,
    options?: EmbeddingModelOptions
 ): Promise<EmbeddingModel>;
 declare function loadModel(
    modelName: string,
    options?: EmbeddingOptions | InferenceOptions
 ): Promise<InferenceModel | EmbeddingModel>;
 /**
 * The nodejs equivalent to python binding's chat_completion
- * @param {LLModel} llmodel - The language model object.
+ * @param {InferenceModel} model - The language model object.
 * @param {PromptMessage[]} messages - The array of messages for the conversation.
 * @param {CompletionOptions} options - The options for creating the completion.
 * @returns {CompletionReturn} The completion result.
 * @example
 * const llmodel = new LLModel(model)
 * const messages = [
 * { role: 'system', message: 'You are a weather forecaster.' },
 * { role: 'user', message: 'should i go out today?' } ]
 * const completion = await createCompletion(llmodel, messages, {
 *  verbose: true,
 *  temp: 0.9,
 * })
 * console.log(completion.choices[0].message.content)
 * // No, it's going to be cold and rainy.
 */
 declare function createCompletion(
-    llmodel: LLModel,
+    model: InferenceModel,
    messages: PromptMessage[],
    options?: CompletionOptions
 ): Promise<CompletionReturn>;
 /**
 * The nodejs moral equivalent to python binding's Embed4All().embed()
 * meow
- * @param {LLModel} llmodel - The language model object.
+ * @param {EmbeddingModel} model - The language model object.
 * @param {string} text - text to embed
 * @returns {Float32Array} The completion result.
 */
 declare function createEmbedding(
-    llmodel: LLModel,
+    model: EmbeddingModel,
-    text: string,
+    text: string
-): Float32Array
+): Float32Array;
 /**
 * The options for creating the completion.
@ -185,16 +223,25 @@ interface CompletionOptions extends Partial<LLModelPromptContext> {
    verbose?: boolean;
    /**
-     * Indicates if the default header is included in the prompt.
+     * Template for the system message. Will be put before the conversation with %1 being replaced by all system messages.
-     * @default true
+     * Note that if this is not defined, system messages will not be included in the prompt.
     */
-    hasDefaultHeader?: boolean;
+    systemPromptTemplate?: string;
    /**
-     * Indicates if the default footer is included in the prompt.
+     * Template for user messages, with %1 being replaced by the message.
     * @default true
     */
-    hasDefaultFooter?: boolean;
+    promptTemplate?: boolean;
    /**
     * The initial instruction for the model, on top of the prompt
     */
    promptHeader?: string;
    /**
     * The last instruction for the model, appended to the end of the prompt.
     */
    promptFooter?: string;
 }
 /**
@ -212,10 +259,8 @@ interface PromptMessage {
 * The result of the completion, similar to OpenAI's format.
 */
 interface CompletionReturn {
-    /** The model name.
+    /** The model used for the completion. */
-     * @type {ModelFile}
+    model: string;
     */
    model: ModelFile[ModelType];
    /** Token usage report. */
    usage: {
@ -246,58 +291,85 @@ interface CompletionChoice {
 */
 interface LLModelPromptContext {
    /** The size of the raw logits vector. */
-    logits_size: number;
+    logitsSize: number;
    /** The size of the raw tokens vector. */
-    tokens_size: number;
+    tokensSize: number;
    /** The number of tokens in the past conversation. */
-    n_past: number;
+    nPast: number;
    /** The number of tokens possible in the context window.
     * @default 1024
     */
-    n_ctx: number;
+    nCtx: number;
    /** The number of tokens to predict.
     * @default 128
     * */
-    n_predict: number;
+    nPredict: number;
    /** The top-k logits to sample from.
     * Top-K sampling selects the next token only from the top K most likely tokens predicted by the model.
     * It helps reduce the risk of generating low-probability or nonsensical tokens, but it may also limit
     * the diversity of the output. A higher value for top-K (eg., 100) will consider more tokens and lead
     * to more diverse text, while a lower value (eg., 10) will focus on the most probable tokens and generate
     * more conservative text. 30 - 60 is a good range for most tasks.
     * @default 40
     * */
-    top_k: number;
+    topK: number;
    /** The nucleus sampling probability threshold.
-     * @default 0.9
+     * Top-P limits the selection of the next token to a subset of tokens with a cumulative probability 
     * above a threshold P. This method, also known as nucleus sampling, finds a balance between diversity
     * and quality by considering both token probabilities and the number of tokens available for sampling.
     * When using a higher value for top-P (eg., 0.95), the generated text becomes more diverse.
     * On the other hand, a lower value (eg., 0.1) produces more focused and conservative text.
     * The default value is 0.4, which is aimed to be the middle ground between focus and diversity, but
     * for more creative tasks a higher top-p value will be beneficial, about 0.5-0.9 is a good range for that.
     * @default 0.4
     * */
-    top_p: number;
+    topP: number;
    /** The temperature to adjust the model's output distribution.
-     * @default 0.72
+     * Temperature is like a knob that adjusts how creative or focused the output becomes. Higher temperatures
     * (eg., 1.2) increase randomness, resulting in more imaginative and diverse text. Lower temperatures (eg., 0.5)
     * make the output more focused, predictable, and conservative. When the temperature is set to 0, the output
     * becomes completely deterministic, always selecting the most probable next token and producing identical results
     * each time. A safe range would be around 0.6 - 0.85, but you are free to search what value fits best for you.
     * @default 0.7
     * */
    temp: number;
    /** The number of predictions to generate in parallel.
     * By splitting the prompt every N tokens, prompt-batch-size reduces RAM usage during processing. However,
     * this can increase the processing time as a trade-off. If the N value is set too low (e.g., 10), long prompts
     * with 500+ tokens will be most affected, requiring numerous processing runs to complete the prompt processing.
     * To ensure optimal performance, setting the prompt-batch-size to 2048 allows processing of all tokens in a single run.
     * @default 8
     * */
-    n_batch: number;
+    nBatch: number;
    /** The penalty factor for repeated tokens.
-     * @default 1
+     * Repeat-penalty can help penalize tokens based on how frequently they occur in the text, including the input prompt.
     * A token that has already appeared five times is penalized more heavily than a token that has appeared only one time.
     * A value of 1 means that there is no penalty and values larger than 1 discourage repeated tokens.
     * @default 1.18
     * */
-    repeat_penalty: number;
+    repeatPenalty: number;
    /** The number of last tokens to penalize.
-     * @default 10
+     * The repeat-penalty-tokens N option controls the number of tokens in the history to consider for penalizing repetition.
     * A larger value will look further back in the generated text to prevent repetitions, while a smaller value will only
     * consider recent tokens.
     * @default 64
     * */
-    repeat_last_n: number;
+    repeatLastN: number;
    /** The percentage of context to erase if the context window is exceeded.
     * @default 0.5
     * */
-    context_erase: number;
+    contextErase: number;
 }
 /**
@ -320,24 +392,35 @@ declare const DEFAULT_DIRECTORY: string;
 * This searches DEFAULT_DIRECTORY/libraries, cwd/libraries, and finally cwd.
 */
 declare const DEFAULT_LIBRARIES_DIRECTORY: string;
 interface PromptMessage {
    role: "system" | "assistant" | "user";
    content: string;
 }
 /**
- * Initiates the download of a model file of a specific model type.
+ * Default model configuration.
 */
 declare const DEFAULT_MODEL_CONFIG: ModelConfig;
 /**
 * Default prompt context.
 */
 declare const DEFAULT_PROMT_CONTEXT: LLModelPromptContext;
 /**
 * Default model list url.
 */
 declare const DEFAULT_MODEL_LIST_URL: string;
 /**
 * Initiates the download of a model file.
 * By default this downloads without waiting. use the controller returned to alter this behavior.
- * @param {ModelFile} modelName - The model file to be downloaded.
+ * @param {string} modelName - The model to be downloaded.
- * @param {DownloadOptions} options - to pass into the downloader. Default is { location: (cwd), debug: false }.
+ * @param {DownloadOptions} options - to pass into the downloader. Default is { location: (cwd), verbose: false }.
 * @returns {DownloadController} object that allows controlling the download process.
 *
 * @throws {Error} If the model already exists in the specified location.
 * @throws {Error} If the model cannot be found at the specified url.
 *
 * @example
- * const controller = download('ggml-gpt4all-j-v1.3-groovy.bin')
+ * const download = downloadModel('ggml-gpt4all-j-v1.3-groovy.bin')
- * controller.promise().then(() => console.log('Downloaded!'))
+ * download.promise.then(() => console.log('Downloaded!'))
 */
 declare function downloadModel(
    modelName: string,
@ -358,46 +441,55 @@ interface DownloadModelOptions {
     * Debug mode -- check how long it took to download in seconds
     * @default false
     */
-    debug?: boolean;
+    verbose?: boolean;
    /**
-     * Remote download url. Defaults to `https://gpt4all.io/models`
+     * Remote download url. Defaults to `https://gpt4all.io/models/<modelName>`
-     * @default https://gpt4all.io/models
+     * @default https://gpt4all.io/models/<modelName>
     */
    url?: string;
    /**
-     * Whether to verify the hash of the download to ensure a proper download occurred.
+     * MD5 sum of the model file. If this is provided, the downloaded file will be checked against this sum.
-     * @default true
+     * If the sums do not match, an error will be thrown and the file will be deleted.
     */
-    md5sum?: boolean;
+    md5sum?: string;
 }
-declare function listModels(): Promise<Record<string, string>[]>;
+interface ListModelsOptions {
    url?: string;
    file?: string;
 }
 declare function listModels(options?: ListModelsOptions): Promise<ModelConfig[]>;
 interface RetrieveModelOptions {
    allowDownload?: boolean;
    verbose?: boolean;
    modelPath?: string;
    modelConfigFile?: string;
 }
 declare function retrieveModel(
-    model: string,
+    modelName: string,
    options?: RetrieveModelOptions
-): Promise<string>;
+): Promise<ModelConfig>;
 /**
 * Model download controller.
 */
 interface DownloadController {
-    /** Cancel the request to download from gpt4all website if this is called. */
+    /** Cancel the request to download if this is called. */
    cancel: () => void;
-    /** Convert the downloader into a promise, allowing people to await and manage its lifetime */
+    /** A promise resolving to the downloaded models config once the download is done */
-    promise: () => Promise<void>;
+    promise: Promise<ModelConfig>;
 }
 export {
    ModelType,
    ModelFile,
    ModelConfig,
    InferenceModel,
    EmbeddingModel,
    LLModel,
    LLModelPromptContext,
    PromptMessage,
@ -409,10 +501,13 @@ export {
    createTokenStream,
    DEFAULT_DIRECTORY,
    DEFAULT_LIBRARIES_DIRECTORY,
    DEFAULT_MODEL_CONFIG,
    DEFAULT_PROMT_CONTEXT,
    DEFAULT_MODEL_LIST_URL,
    downloadModel,
    retrieveModel,
    listModels,
    DownloadController,
    RetrieveModelOptions,
-    DownloadModelOptions
+    DownloadModelOptions,
 };
--- a/gpt4all-bindings/typescript/src/gpt4all.js
+++ b/gpt4all-bindings/typescript/src/gpt4all.js
@ -10,19 +10,36 @@ const {
    downloadModel,
    appendBinSuffixIfMissing,
 } = require("./util.js");
-const { DEFAULT_DIRECTORY, DEFAULT_LIBRARIES_DIRECTORY } = require("./config.js");
+const {
    DEFAULT_DIRECTORY,
    DEFAULT_LIBRARIES_DIRECTORY,
    DEFAULT_PROMPT_CONTEXT,
    DEFAULT_MODEL_CONFIG,
    DEFAULT_MODEL_LIST_URL,
 } = require("./config.js");
 const { InferenceModel, EmbeddingModel } = require("./models.js");
 /**
 * Loads a machine learning model with the specified name. The defacto way to create a model.
 * By default this will download a model from the official GPT4ALL website, if a model is not present at given path.
 *
 * @param {string} modelName - The name of the model to load.
 * @param {LoadModelOptions|undefined} [options] - (Optional) Additional options for loading the model.
 * @returns {Promise<InferenceModel | EmbeddingModel>} A promise that resolves to an instance of the loaded LLModel.
 */
 async function loadModel(modelName, options = {}) {
    const loadOptions = {
        modelPath: DEFAULT_DIRECTORY,
        librariesPath: DEFAULT_LIBRARIES_DIRECTORY,
        type: "inference",
        allowDownload: true,
        verbose: true,
        ...options,
    };
-    await retrieveModel(modelName, {
+    const modelConfig = await retrieveModel(modelName, {
        modelPath: loadOptions.modelPath,
        modelConfigFile: loadOptions.modelConfigFile,
        allowDownload: loadOptions.allowDownload,
        verbose: loadOptions.verbose,
    });
@ -37,7 +54,7 @@ async function loadModel(modelName, options = {}) {
            break;
        }
    }
-    if(!libPath) {
+    if (!libPath) {
        throw Error("Could not find a valid path from " + libSearchPaths);
    }
    const llmOptions = {
@ -47,99 +64,183 @@ async function loadModel(modelName, options = {}) {
    };
    if (loadOptions.verbose) {
-        console.log("Creating LLModel with options:", llmOptions);
+        console.debug("Creating LLModel with options:", llmOptions);
    }
    const llmodel = new LLModel(llmOptions);
-    return llmodel;
+    if (loadOptions.type === "embedding") {
        return new EmbeddingModel(llmodel, modelConfig);
    } else if (loadOptions.type === "inference") {
        return new InferenceModel(llmodel, modelConfig);
    } else {
        throw Error("Invalid model type: " + loadOptions.type);
    }
 }
-function createPrompt(messages, hasDefaultHeader, hasDefaultFooter) {
+/**
-    let fullPrompt = [];
+ * Formats a list of messages into a single prompt string.
-
+ */
-    for (const message of messages) {
+function formatChatPrompt(
        if (message.role === "system") {
            const systemMessage = message.content;
            fullPrompt.push(systemMessage);
        }
    }
    if (hasDefaultHeader) {
        fullPrompt.push(`### Instruction: The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.`);
    }
    let prompt = "### Prompt:";
    for (const message of messages) {
        if (message.role === "user") {
            const user_message = message["content"];
            prompt += user_message;
        }
        if (message["role"] == "assistant") {
            const assistant_message = "Response:" + message["content"];
            prompt += assistant_message;
        }
    }
    fullPrompt.push(prompt);
    if (hasDefaultFooter) {
        fullPrompt.push("### Response:");
    }
    return fullPrompt.join('\n');
 }
 function createEmbedding(llmodel, text) {
    return llmodel.embed(text)
 }
 async function createCompletion(
    llmodel,
    messages,
-    options = {
+    {
-        hasDefaultHeader: true,
+        systemPromptTemplate,
-        hasDefaultFooter: false,
+        defaultSystemPrompt,
-        verbose: true,
+        promptTemplate,
        promptFooter,
        promptHeader,
    }
 ) {
-    //creating the keys to insert into promptMaker.
+    const systemMessages = messages
-    const fullPrompt = createPrompt(
+        .filter((message) => message.role === "system")
-        messages,
+        .map((message) => message.content);
-        options.hasDefaultHeader ?? true,
+
-        options.hasDefaultFooter ?? true
+    let fullPrompt = "";
-    );
+
-    if (options.verbose) {
+    if (promptHeader) {
-        console.log("Sent: " + fullPrompt);
+        fullPrompt += promptHeader + "\n\n";
    }
-    const promisifiedRawPrompt = llmodel.raw_prompt(fullPrompt, options, (s) => {});
+
-    return promisifiedRawPrompt.then((response) => {
+    if (systemPromptTemplate) {
-        return {
+        // if user specified a template for the system prompt, put all system messages in the template
-            llmodel: llmodel.name(),
+        let systemPrompt = "";
-            usage: {
+
-                prompt_tokens: fullPrompt.length,
+        if (systemMessages.length > 0) {
-                completion_tokens: response.length, //TODO
+            systemPrompt += systemMessages.join("\n");
-                total_tokens: fullPrompt.length + response.length, //TODO
+        }
-            },
+
-            choices: [
+        if (systemPrompt) {
-                {
+            fullPrompt +=
-                    message: {
+                systemPromptTemplate.replace("%1", systemPrompt) + "\n";
-                        role: "assistant",
+        }
-                        content: response,
+    } else if (defaultSystemPrompt) {
-                    },
+        // otherwise, use the system prompt from the model config and ignore system messages
-                },
+        fullPrompt += defaultSystemPrompt + "\n\n";
-            ],
+    }
-        };
+
    if (systemMessages.length > 0 && !systemPromptTemplate) {
        console.warn(
            "System messages were provided, but no systemPromptTemplate was specified. System messages will be ignored."
        );
    }
    for (const message of messages) {
        if (message.role === "user") {
            const userMessage = promptTemplate.replace(
                "%1",
                message["content"]
            );
            fullPrompt += userMessage;
        }
        if (message["role"] == "assistant") {
            const assistantMessage = message["content"] + "\n";
            fullPrompt += assistantMessage;
        }
    }
    if (promptFooter) {
        fullPrompt += "\n\n" + promptFooter;
    }
    return fullPrompt;
 }
 function createEmbedding(model, text) {
    return model.embed(text);
 }
 const defaultCompletionOptions = {
    verbose: false,
    ...DEFAULT_PROMPT_CONTEXT,
 };
 async function createCompletion(
    model,
    messages,
    options = defaultCompletionOptions
 ) {
    if (options.hasDefaultHeader !== undefined) {
        console.warn(
            "hasDefaultHeader (bool) is deprecated and has no effect, use promptHeader (string) instead"
        );
    }
    if (options.hasDefaultFooter !== undefined) {
        console.warn(
            "hasDefaultFooter (bool) is deprecated and has no effect, use promptFooter (string) instead"
        );
    }
    const optionsWithDefaults = {
        ...defaultCompletionOptions,
        ...options,
    };
    const {
        verbose,
        systemPromptTemplate,
        promptTemplate,
        promptHeader,
        promptFooter,
        ...promptContext
    } = optionsWithDefaults;
    const prompt = formatChatPrompt(messages, {
        systemPromptTemplate,
        defaultSystemPrompt: model.config.systemPrompt,
        promptTemplate: promptTemplate || model.config.promptTemplate || "%1",
        promptHeader: promptHeader || "",
        promptFooter: promptFooter || "",
        // These were the default header/footer prompts used for non-chat single turn completions.
        // both seem to be working well still with some models, so keeping them here for reference.
        // promptHeader: '### Instruction: The prompt below is a question to answer, a task to complete, or a conversation to respond to; decide which and write an appropriate response.',
        // promptFooter: '### Response:',
    });
    if (verbose) {
        console.debug("Sending Prompt:\n" + prompt);
    }
    const response = await model.generate(prompt, promptContext);
    if (verbose) {
        console.debug("Received Response:\n" + response);
    }
    return {
        llmodel: model.llm.name(),
        usage: {
            prompt_tokens: prompt.length,
            completion_tokens: response.length, //TODO
            total_tokens: prompt.length + response.length, //TODO
        },
        choices: [
            {
                message: {
                    role: "assistant",
                    content: response,
                },
            },
        ],
    };
 }
 function createTokenStream() {
-    throw Error("This API has not been completed yet!")
+    throw Error("This API has not been completed yet!");
 }
 module.exports = {
    DEFAULT_LIBRARIES_DIRECTORY,
    DEFAULT_DIRECTORY,
    DEFAULT_PROMPT_CONTEXT,
    DEFAULT_MODEL_CONFIG,
    DEFAULT_MODEL_LIST_URL,
    LLModel,
    InferenceModel,
    EmbeddingModel,
    createCompletion,
    createEmbedding,
    downloadModel,
    retrieveModel,
    loadModel,
-    createTokenStream
+    createTokenStream,
 };
--- a/gpt4all-bindings/typescript/src/models.js
+++ b/gpt4all-bindings/typescript/src/models.js
@ -0,0 +1,38 @@
 const { normalizePromptContext, warnOnSnakeCaseKeys } = require('./util');
 class InferenceModel {
    llm;
    config;
    constructor(llmodel, config) {
        this.llm = llmodel;
        this.config = config;
    }
    async generate(prompt, promptContext) {
        warnOnSnakeCaseKeys(promptContext);
        const normalizedPromptContext = normalizePromptContext(promptContext);
        const result = this.llm.raw_prompt(prompt, normalizedPromptContext, () => {});
        return result;
    }
 }
 class EmbeddingModel {
    llm;
    config;
    constructor(llmodel, config) {
        this.llm = llmodel;
        this.config = config;
    }
    embed(text) {
        return this.llm.embed(text)
    }
 }
 module.exports = {
    InferenceModel,
    EmbeddingModel,
 };
--- a/gpt4all-bindings/typescript/src/util.js
+++ b/gpt4all-bindings/typescript/src/util.js
@ -1,14 +1,45 @@
 const { createWriteStream, existsSync, statSync } = require("node:fs");
-const fsp = require('node:fs/promises')
+const fsp = require("node:fs/promises");
 const { performance } = require("node:perf_hooks");
 const path = require("node:path");
-const {mkdirp} = require("mkdirp");
+const { mkdirp } = require("mkdirp");
-const { DEFAULT_DIRECTORY, DEFAULT_LIBRARIES_DIRECTORY } = require("./config.js");
+const md5File = require("md5-file");
-const md5File = require('md5-file');
+const {
-async function listModels() {
+    DEFAULT_DIRECTORY,
-    const res = await fetch("https://gpt4all.io/models/models.json");
+    DEFAULT_MODEL_CONFIG,
-    const modelList = await res.json();
+    DEFAULT_MODEL_LIST_URL,
-    return modelList;
+} = require("./config.js");
 async function listModels(
    options = {
        url: DEFAULT_MODEL_LIST_URL,
    }
 ) {
    if (!options || (!options.url && !options.file)) {
        throw new Error(
            `No model list source specified. Please specify either a url or a file.`
        );
    }
    if (options.file) {
        if (!existsSync(options.file)) {
            throw new Error(`Model list file ${options.file} does not exist.`);
        }
        const fileContents = await fsp.readFile(options.file, "utf-8");
        const modelList = JSON.parse(fileContents);
        return modelList;
    } else if (options.url) {
        const res = await fetch(options.url);
        if (!res.ok) {
            throw Error(
                `Failed to retrieve model list from ${url} - ${res.status} ${res.statusText}`
            );
        }
        const modelList = await res.json();
        return modelList;
    }
 }
 function appendBinSuffixIfMissing(name) {
@ -32,11 +63,46 @@ function readChunks(reader) {
    };
 }
 /**
 * Prints a warning if any keys in the prompt context are snake_case.
 */
 function warnOnSnakeCaseKeys(promptContext) {
    const snakeCaseKeys = Object.keys(promptContext).filter((key) =>
        key.includes("_")
    );
    if (snakeCaseKeys.length > 0) {
        console.warn(
            "Prompt context keys should be camelCase. Support for snake_case might be removed in the future. Found keys: " +
                snakeCaseKeys.join(", ")
        );
    }
 }
 /**
 * Converts all keys in the prompt context to snake_case
 * For duplicate definitions, the value of the last occurrence will be used.
 */
 function normalizePromptContext(promptContext) {
    const normalizedPromptContext = {};
    for (const key in promptContext) {
        if (promptContext.hasOwnProperty(key)) {
            const snakeKey = key.replace(
                /[A-Z]/g,
                (match) => `_${match.toLowerCase()}`
            );
            normalizedPromptContext[snakeKey] = promptContext[key];
        }
    }
    return normalizedPromptContext;
 }
 function downloadModel(modelName, options = {}) {
    const downloadOptions = {
        modelPath: DEFAULT_DIRECTORY,
-        debug: false,
+        verbose: false,
        md5sum: true,
        ...options,
    };
@ -46,11 +112,16 @@ function downloadModel(modelName, options = {}) {
        modelName + ".part"
    );
    const finalModelPath = path.join(downloadOptions.modelPath, modelFileName);
-    const modelUrl = downloadOptions.url ?? `https://gpt4all.io/models/${modelFileName}`;
+    const modelUrl =
        downloadOptions.url ?? `https://gpt4all.io/models/${modelFileName}`;
    if (existsSync(finalModelPath)) {
        throw Error(`Model already exists at ${finalModelPath}`);
    }
    if (downloadOptions.verbose) {
        console.log(`Downloading ${modelName} from ${modelUrl}`);
    }
    const headers = {
        "Accept-Ranges": "arraybuffer",
@ -69,85 +140,81 @@ function downloadModel(modelName, options = {}) {
    const abortController = new AbortController();
    const signal = abortController.signal;
-    // wrapper function to get the readable stream from request
+    const finalizeDownload = async () => {
-    const fetchModel = (fetchOpts = {}) =>
+        if (options.md5sum) {
-        fetch(modelUrl, {
+            const fileHash = await md5File(partialModelPath);
-            signal,
+            if (fileHash !== options.md5sum) {
-            ...fetchOpts,
+                await fsp.unlink(partialModelPath);
-        }).then((res) => {
+                const message = `Model "${modelName}" failed verification: Hashes mismatch. Expected ${options.md5sum}, got ${fileHash}`;
-            if (!res.ok) {
+                throw Error(message);
                throw Error(
                    `Failed to download model from ${modelUrl} - ${res.statusText}`
                );
            }
-            return res.body.getReader();
+            if (options.verbose) {
                console.log(`MD5 hash verified: ${fileHash}`);
            }
        }
        await fsp.rename(partialModelPath, finalModelPath);
    };
    // a promise that executes and writes to a stream. Resolves to the path the model was downloaded to when done writing.
    const downloadPromise = new Promise((resolve, reject) => {
        let timestampStart;
        if (options.verbose) {
            console.log(`Downloading @ ${partialModelPath} ...`);
            timestampStart = performance.now();
        }
        const writeStream = createWriteStream(
            partialModelPath,
            writeStreamOpts
        );
        writeStream.on("error", (e) => {
            writeStream.close();
            reject(e);
        });
-    // a promise that executes and writes to a stream. Resolves when done writing.
+        writeStream.on("finish", () => {
-    const res = new Promise((resolve, reject) => {
+            if (options.verbose) {
-        fetchModel({ headers })
+                const elapsed = performance.now() - timestampStart;
-            // Resolves an array of a reader and writestream.
+                console.log(`Finished. Download took ${elapsed.toFixed(2)} ms`);
-            .then((reader) => [
+            }
                reader,
                createWriteStream(partialModelPath, writeStreamOpts),
            ])
            .then(async ([readable, wstream]) => {
                console.log("Downloading @ ", partialModelPath);
                let perf;
-                if (options.debug) {
+            finalizeDownload()
-                    perf = performance.now();
+                .then(() => {
                    resolve(finalModelPath);
                })
                .catch(reject);
        });
        fetch(modelUrl, {
            signal,
            headers,
        })
            .then((res) => {
                if (!res.ok) {
                    const message = `Failed to download model from ${modelUrl} - ${res.status} ${res.statusText}`;
                    reject(Error(message));
                }
-
+                return res.body.getReader();
-                wstream.on("finish", () => {
+            })
-                    if (options.debug) {
+            .then(async (reader) => {
-                        console.log(
+                for await (const chunk of readChunks(reader)) {
-                            "Time taken: ",
+                    writeStream.write(chunk);
                            (performance.now() - perf).toFixed(2),
                            " ms"
                        );
                    }
                    wstream.close();
                });
                wstream.on("error", (e) => {
                    wstream.close();
                    reject(e);
                });
                for await (const chunk of readChunks(readable)) {
                    wstream.write(chunk);
                }
-
+                writeStream.end();
                if (options.md5sum) {
                    const fileHash = await md5File(partialModelPath);
                    if (fileHash !== options.md5sum) {
                        await fsp.unlink(partialModelPath);
                        return reject(
                            Error(`Model "${modelName}" failed verification: Hashes mismatch`)
                        );
                    }
                    if (options.debug) {
                        console.log("MD5 hash verified: ", fileHash);
                    }
                }
                await fsp.rename(partialModelPath, finalModelPath);
                resolve(finalModelPath);
            })
            .catch(reject);
    });
    return {
        cancel: () => abortController.abort(),
-        promise: () => res,
+        promise: downloadPromise,
    };
 }
-async function retrieveModel (
+async function retrieveModel(modelName, options = {}) {
    modelName,
    options = {}
 ) {
    const retrieveOptions = {
        modelPath: DEFAULT_DIRECTORY,
        allowDownload: true,
@ -161,46 +228,68 @@ async function retrieveModel (
    const fullModelPath = path.join(retrieveOptions.modelPath, modelFileName);
    const modelExists = existsSync(fullModelPath);
-    if (modelExists) {
+    let config = { ...DEFAULT_MODEL_CONFIG };
        return fullModelPath;
    }
-    if (!retrieveOptions.allowDownload) {
+    const availableModels = await listModels({
-        throw Error(`Model does not exist at ${fullModelPath}`);
+        file: retrieveOptions.modelConfigFile,
-    }
+        url:
-
+            retrieveOptions.allowDownload &&
-    const availableModels = await listModels();
+            "https://gpt4all.io/models/models.json",
    const foundModel = availableModels.find((model) => model.filename === modelFileName);
    if (!foundModel) {
        throw Error(`Model "${modelName}" is not available.`);
    }
    //todo  
    if (retrieveOptions.verbose) {
        console.log(`Downloading ${modelName}...`);
    }
    const downloadController = downloadModel(modelName, {
        modelPath: retrieveOptions.modelPath,
        debug: retrieveOptions.verbose,
        url: foundModel.url
    });
-    const downloadPath = await downloadController.promise();
+    const loadedModelConfig = availableModels.find(
        (model) => model.filename === modelFileName
    );
-    if (retrieveOptions.verbose) {
+    if (loadedModelConfig) {
-        console.log(`Model downloaded to ${downloadPath}`);
+        config = {
            ...config,
            ...loadedModelConfig,
        };
    } else {
        // if there's no local modelConfigFile specified, and allowDownload is false, the default model config will be used.
        // warning the user here because the model may not work as expected.
        console.warn(
            `Failed to load model config for ${modelName}. Using defaults.`
        );
    }
-    return downloadPath
+    config.systemPrompt = config.systemPrompt.trim();
    if (modelExists) {
        config.path = fullModelPath;
        if (retrieveOptions.verbose) {
            console.log(`Found ${modelName} at ${fullModelPath}`);
        }
    } else if (retrieveOptions.allowDownload) {
        const downloadController = downloadModel(modelName, {
            modelPath: retrieveOptions.modelPath,
            verbose: retrieveOptions.verbose,
            filesize: config.filesize,
            url: config.url,
            md5sum: config.md5sum,
        });
        const downloadPath = await downloadController.promise;
        config.path = downloadPath;
        if (retrieveOptions.verbose) {
            console.log(`Model downloaded to ${downloadPath}`);
        }
    } else {
        throw Error("Failed to retrieve model.");
    }
    return config;
 }
 module.exports = {
    appendBinSuffixIfMissing,
    downloadModel,
    retrieveModel,
-    listModels
+    listModels,
    normalizePromptContext,
    warnOnSnakeCaseKeys,
 };
--- a/gpt4all-bindings/typescript/test/gpt4all.test.js
+++ b/gpt4all-bindings/typescript/test/gpt4all.test.js
@ -1,79 +1,228 @@
-const path = require('node:path');
+const path = require("node:path");
-const os = require('node:os');
+const os = require("node:os");
-const { LLModel } = require('node-gyp-build')(path.resolve(__dirname, '..'));
+const fsp = require("node:fs/promises");
 const { LLModel } = require("node-gyp-build")(path.resolve(__dirname, ".."));
 const {
-  listModels,
+    listModels,
-  downloadModel,
+    downloadModel,
-  appendBinSuffixIfMissing,
+    appendBinSuffixIfMissing,
-} = require('../src/util.js');
+    normalizePromptContext,
 } = require("../src/util.js");
 const {
-  DEFAULT_DIRECTORY,
+    DEFAULT_DIRECTORY,
-  DEFAULT_LIBRARIES_DIRECTORY,
+    DEFAULT_LIBRARIES_DIRECTORY,
-} = require('../src/config.js');
+    DEFAULT_MODEL_LIST_URL,
 } = require("../src/config.js");
 const {
-  loadModel,
+    loadModel,
-  createPrompt,
+    createPrompt,
-  createCompletion,
+    createCompletion,
-} = require('../src/gpt4all.js');
+} = require("../src/gpt4all.js");
 const { mock } = require("node:test");
-
+describe("config", () => {
-global.fetch = jest.fn(() =>
+    test("default paths constants are available and correct", () => {
-  Promise.resolve({
+        expect(DEFAULT_DIRECTORY).toBe(
-    json: () => Promise.resolve([{}, {}, {}]),
+            path.resolve(os.homedir(), ".cache/gpt4all")
-  })
+        );
 );
 jest.mock('../src/util.js', () => {
    const actualModule = jest.requireActual('../src/util.js');
    return {
       ...actualModule,
        downloadModel: jest.fn(() => 
            ({ cancel: jest.fn(), promise: jest.fn() })
        )
    }
 })
 beforeEach(() => {
  downloadModel.mockClear()
 });
 afterEach( () => {
  fetch.mockClear();
  jest.clearAllMocks()
 })
 describe('utils', () => {
    test("appendBinSuffixIfMissing", () => {
        expect(appendBinSuffixIfMissing("filename")).toBe("filename.bin")
        expect(appendBinSuffixIfMissing("filename.bin")).toBe("filename.bin")
    })
    test("default paths", () => {
        expect(DEFAULT_DIRECTORY).toBe(path.resolve(os.homedir(), ".cache/gpt4all"))
        const paths = [
            path.join(DEFAULT_DIRECTORY, "libraries"),
            path.resolve("./libraries"),
            path.resolve(
-            __dirname,
+                __dirname,
-            "..",
+                "..",
-            `runtimes/${process.platform}-${process.arch}/native`
+                `runtimes/${process.platform}-${process.arch}/native`
            ),
            process.cwd(),
        ];
-        expect(typeof DEFAULT_LIBRARIES_DIRECTORY).toBe('string')
+        expect(typeof DEFAULT_LIBRARIES_DIRECTORY).toBe("string");
-        expect(DEFAULT_LIBRARIES_DIRECTORY).toBe(paths.join(';'))
+        expect(DEFAULT_LIBRARIES_DIRECTORY).toBe(paths.join(";"));
-    })
+    });
 });
-    test("listModels", async () => {
+describe("listModels", () => {
-        try { 
+    const fakeModels = require("./models.json");
-            await listModels();
+    const fakeModel = fakeModels[0];
-        } catch(e) {}
+    const mockResponse = JSON.stringify([fakeModel]);
-      
+
-        expect(fetch).toHaveBeenCalledTimes(1)
+    let mockFetch, originalFetch;
-        expect(fetch).toHaveBeenCalledWith(
+
-          "https://gpt4all.io/models/models.json"
+    beforeAll(() => {
        // Mock the fetch function for all tests
        mockFetch = jest.fn().mockResolvedValue({
            ok: true,
            json: () => JSON.parse(mockResponse),
        });
        originalFetch = global.fetch;
        global.fetch = mockFetch;
    });
    afterEach(() => {
        // Reset the fetch counter after each test
        mockFetch.mockClear();
    });
    afterAll(() => {
        // Restore fetch
        global.fetch = originalFetch;
    });
    it("should load the model list from remote when called without args", async () => {
        const models = await listModels();
        expect(fetch).toHaveBeenCalledTimes(1);
        expect(fetch).toHaveBeenCalledWith(DEFAULT_MODEL_LIST_URL);
        expect(models[0]).toEqual(fakeModel);
    });
    it("should load the model list from a local file, if specified", async () => {
        const models = await listModels({
            file: path.resolve(__dirname, "models.json"),
        });
        expect(fetch).toHaveBeenCalledTimes(0);
        expect(models[0]).toEqual(fakeModel);
    });
    it("should throw an error if neither url nor file is specified", async () => {
        await expect(listModels(null)).rejects.toThrow(
            "No model list source specified. Please specify either a url or a file."
        );
-        
+    });
-    })
+});
-})
+describe("appendBinSuffixIfMissing", () => {
    it("should make sure the suffix is there", () => {
        expect(appendBinSuffixIfMissing("filename")).toBe("filename.bin");
        expect(appendBinSuffixIfMissing("filename.bin")).toBe("filename.bin");
    });
 });
 describe("downloadModel", () => {
    let mockAbortController, mockFetch;
    const fakeModelName = "fake-model";
    const createMockFetch = () => {
        const mockData = new Uint8Array([1, 2, 3, 4]);
        const mockResponse = new ReadableStream({
            start(controller) {
                controller.enqueue(mockData);
                controller.close();
            },
        });
        const mockFetchImplementation = jest.fn(() =>
            Promise.resolve({
                ok: true,
                body: mockResponse,
            })
        );
        return mockFetchImplementation;
    };
    beforeEach(() => {
        // Mocking the AbortController constructor
        mockAbortController = jest.fn();
        global.AbortController = mockAbortController;
        mockAbortController.mockReturnValue({
            signal: "signal",
            abort: jest.fn(),
        });
        mockFetch = createMockFetch();
        jest.spyOn(global, "fetch").mockImplementation(mockFetch);
    });
    afterEach(() => {
        // Clean up mocks
        mockAbortController.mockReset();
        mockFetch.mockClear();
        global.fetch.mockRestore();
    });
    test("should successfully download a model file", async () => {
        const downloadController = downloadModel(fakeModelName);
        const modelFilePath = await downloadController.promise;
        expect(modelFilePath).toBe(`${DEFAULT_DIRECTORY}/${fakeModelName}.bin`);
        expect(global.fetch).toHaveBeenCalledTimes(1);
        expect(global.fetch).toHaveBeenCalledWith(
            "https://gpt4all.io/models/fake-model.bin",
            {
                signal: "signal",
                headers: {
                    "Accept-Ranges": "arraybuffer",
                    "Response-Type": "arraybuffer",
                },
            }
        );
        // final model file should be present
        expect(fsp.access(modelFilePath)).resolves.not.toThrow();
        // remove the testing model file
        await fsp.unlink(modelFilePath);
    });
    test("should error and cleanup if md5sum is not matching", async () => {
        const downloadController = downloadModel(fakeModelName, {
            md5sum: "wrong-md5sum",
        });
        // the promise should reject with a mismatch
        await expect(downloadController.promise).rejects.toThrow(
            `Model "${fakeModelName}" failed verification: Hashes mismatch.`
        );
        // fetch should have been called
        expect(global.fetch).toHaveBeenCalledTimes(1);
        // the file should be missing
        expect(
            fsp.access(`${DEFAULT_DIRECTORY}/${fakeModelName}.bin`)
        ).rejects.toThrow();
        // partial file should also be missing
        expect(
            fsp.access(`${DEFAULT_DIRECTORY}/${fakeModelName}.part`)
        ).rejects.toThrow();
    });
    // TODO
    // test("should be able to cancel and resume a download", async () => {
    // });
 });
 describe("normalizePromptContext", () => {
    it("should convert a dict with camelCased keys to snake_case", () => {
        const camelCased = {
            topK: 20,
            repeatLastN: 10,
        };
        const expectedSnakeCased = {
            top_k: 20,
            repeat_last_n: 10,
        };
        const result = normalizePromptContext(camelCased);
        expect(result).toEqual(expectedSnakeCased);
    });
    it("should convert a mixed case dict to snake_case, last value taking precedence", () => {
        const mixedCased = {
            topK: 20,
            top_k: 10,
            repeatLastN: 10,
        };
        const expectedSnakeCased = {
            top_k: 10,
            repeat_last_n: 10,
        };
        const result = normalizePromptContext(mixedCased);
        expect(result).toEqual(expectedSnakeCased);
    });
    it("should not modify already snake cased dict", () => {
        const snakeCased = {
            top_k: 10,
            repeast_last_n: 10,
        };
        const result = normalizePromptContext(snakeCased);
        expect(result).toEqual(snakeCased);
    });
 });
--- a/gpt4all-bindings/typescript/test/models.json
+++ b/gpt4all-bindings/typescript/test/models.json
@ -0,0 +1,10 @@
 [
  {
    "order": "a",
    "md5sum": "08d6c05a21512a79a1dfeb9d2a8f262f",
    "name": "Not a real model",
    "filename": "fake-model.bin",
    "filesize": "4",
    "systemPrompt": " "
  }
 ]
--- a/gpt4all-bindings/typescript/yarn.lock
+++ b/gpt4all-bindings/typescript/yarn.lock