feat(typescript)/dynamic template (#1287) (#1326)

* feat(typescript)/dynamic template (#1287)

* remove packaged yarn

* prompt templates update wip

* prompt template update

* system prompt template, update types, remove embed promises, cleanup

* support both snakecased and camelcased prompt context

* fix #1277 libbert, libfalcon and libreplit libs not being moved into the right folder after build

* added support for modelConfigFile param, allowing the user to specify a local file instead of downloading the remote models.json. added a warning message if code fails to load a model config. included prompt context docs by amogus.

* snakecase warning, put logic for loading local models.json into listModels, added constant for the default remote model list url, test improvements, simpler hasOwnProperty call

* add DEFAULT_PROMPT_CONTEXT, export new constants

* add md5sum testcase and fix constants export

* update types

* throw if attempting to list models without a source

* rebuild docs

* fix download logging undefined url, toFixed typo, pass config filesize in for future progress report

* added overload with union types

* bump to 2.2.0, remove alpha

* code speling

---------

Co-authored-by: Andreas Obersteiner <8959303+iimez@users.noreply.github.com>
This commit is contained in:
Jacob Nguyen
2023-08-14 11:45:45 -05:00
committed by GitHub
parent 4d855afe97
commit 4e55940edf
15 changed files with 5876 additions and 6938 deletions

View File

@@ -1,7 +1,7 @@
# GPT4All Node.js API
```sh
yarn install gpt4all@alpha
yarn add gpt4all@alpha
npm install gpt4all@alpha
@@ -10,34 +10,41 @@ pnpm install gpt4all@alpha
The original [GPT4All typescript bindings](https://github.com/nomic-ai/gpt4all-ts) are now out of date.
* New bindings created by [jacoobes](https://github.com/jacoobes) and the [nomic ai community](https://home.nomic.ai) :D, for all to use.
* [Documentation](#Documentation)
* New bindings created by [jacoobes](https://github.com/jacoobes), [limez](https://github.com/iimez) and the [nomic ai community](https://home.nomic.ai), for all to use.
* The nodejs api has made strides to mirror the python api. It is not 100% mirrored, but many pieces of the api resemble its python counterpart.
* Everything should work out the box.
* See [API Reference](#api-reference)
### Code (alpha)
### Chat Completion (alpha)
```js
import { createCompletion, loadModel } from '../src/gpt4all.js'
const ll = await loadModel('ggml-vicuna-7b-1.1-q4_2.bin', { verbose: true });
const model = await loadModel('ggml-vicuna-7b-1.1-q4_2', { verbose: true });
const response = await createCompletion(ll, [
const response = await createCompletion(model, [
{ role : 'system', content: 'You are meant to be annoying and unhelpful.' },
{ role : 'user', content: 'What is 1 + 1?' }
]);
```
### API
### Embedding (alpha)
* The nodejs api has made strides to mirror the python api. It is not 100% mirrored, but many pieces of the api resemble its python counterpart.
* Everything should work out the box.
* [docs](./docs/api.md)
```js
import { createEmbedding, loadModel } from '../src/gpt4all.js'
const model = await loadModel('ggml-all-MiniLM-L6-v2-f16', { verbose: true });
const fltArray = createEmbedding(model, "Pain is inevitable, suffering optional");
```
### Build Instructions
* As of 05/21/2023, Tested on windows (MSVC). (somehow got it to work on MSVC 🤯)
* binding.gyp is compile config
* binding.gyp is compile config
* Tested on Ubuntu. Everything seems to work fine
* Tested on Windows. Everything works fine.
* Sparse testing on mac os.
* MingW works as well to build the gpt4all-backend. **HOWEVER**, this package works only with MSVC built dlls.
### Requirements
@@ -48,11 +55,11 @@ const response = await createCompletion(ll, [
* [node-gyp](https://github.com/nodejs/node-gyp)
* all of its requirements.
* (unix) gcc version 12
* These bindings use the C++ 20 standard.
* (win) msvc version 143
* Can be obtained with visual studio 2022 build tools
* python 3
### Build
### Build (from source)
```sh
git clone https://github.com/nomic-ai/gpt4all.git
@@ -117,22 +124,27 @@ yarn test
* Handling prompting and inference of models in a threadsafe, asynchronous way.
#### docs/
### Known Issues
* Autogenerated documentation using the script `yarn docs:build`
* why your model may be spewing bull 💩
* The downloaded model is broken (just reinstall or download from official site)
* That's it so far
### Roadmap
This package is in active development, and breaking changes may happen until the api stabilizes. Here's what's the todo list:
* \[x] prompt models via a threadsafe function in order to have proper non blocking behavior in nodejs
* \[ ] createTokenStream, an async iterator that streams each token emitted from the model. Planning on following this [example](https://github.com/nodejs/node-addon-examples/tree/main/threadsafe-async-iterator)
* \[ ] proper unit testing (integrate with circle ci)
* \[ ] publish to npm under alpha tag `gpt4all@alpha`
* \[ ] have more people test on other platforms (mac tester needed)
* \[ ] ~~createTokenStream, an async iterator that streams each token emitted from the model. Planning on following this [example](https://github.com/nodejs/node-addon-examples/tree/main/threadsafe-async-iterator)~~ May not implement unless someone else can complete
* \[x] proper unit testing (integrate with circle ci)
* \[x] publish to npm under alpha tag `gpt4all@alpha`
* \[x] have more people test on other platforms (mac tester needed)
* \[x] switch to new pluggable backend
* \[ ] NPM bundle size reduction via optionalDependencies strategy (need help)
* Should include prebuilds to avoid painful node-gyp errors
* \[ ] createChatSession ( the python equivalent to create\_chat\_session )
### Documentation
### API Reference
<!-- Generated by documentation.js. Update this documentation by updating the source code. -->
@@ -166,13 +178,14 @@ This package is in active development, and breaking changes may happen until the
* [Parameters](#parameters-5)
* [createCompletion](#createcompletion)
* [Parameters](#parameters-6)
* [Examples](#examples)
* [createEmbedding](#createembedding)
* [Parameters](#parameters-7)
* [CompletionOptions](#completionoptions)
* [verbose](#verbose)
* [hasDefaultHeader](#hasdefaultheader)
* [hasDefaultFooter](#hasdefaultfooter)
* [systemPromptTemplate](#systemprompttemplate)
* [promptTemplate](#prompttemplate)
* [promptHeader](#promptheader)
* [promptFooter](#promptfooter)
* [PromptMessage](#promptmessage)
* [role](#role)
* [content](#content)
@@ -186,28 +199,31 @@ This package is in active development, and breaking changes may happen until the
* [CompletionChoice](#completionchoice)
* [message](#message)
* [LLModelPromptContext](#llmodelpromptcontext)
* [logits\_size](#logits_size)
* [tokens\_size](#tokens_size)
* [n\_past](#n_past)
* [n\_ctx](#n_ctx)
* [n\_predict](#n_predict)
* [top\_k](#top_k)
* [top\_p](#top_p)
* [logitsSize](#logitssize)
* [tokensSize](#tokenssize)
* [nPast](#npast)
* [nCtx](#nctx)
* [nPredict](#npredict)
* [topK](#topk)
* [topP](#topp)
* [temp](#temp)
* [n\_batch](#n_batch)
* [repeat\_penalty](#repeat_penalty)
* [repeat\_last\_n](#repeat_last_n)
* [context\_erase](#context_erase)
* [nBatch](#nbatch)
* [repeatPenalty](#repeatpenalty)
* [repeatLastN](#repeatlastn)
* [contextErase](#contexterase)
* [createTokenStream](#createtokenstream)
* [Parameters](#parameters-8)
* [DEFAULT\_DIRECTORY](#default_directory)
* [DEFAULT\_LIBRARIES\_DIRECTORY](#default_libraries_directory)
* [DEFAULT\_MODEL\_CONFIG](#default_model_config)
* [DEFAULT\_PROMT\_CONTEXT](#default_promt_context)
* [DEFAULT\_MODEL\_LIST\_URL](#default_model_list_url)
* [downloadModel](#downloadmodel)
* [Parameters](#parameters-9)
* [Examples](#examples-1)
* [Examples](#examples)
* [DownloadModelOptions](#downloadmodeloptions)
* [modelPath](#modelpath)
* [debug](#debug)
* [verbose](#verbose-1)
* [url](#url)
* [md5sum](#md5sum)
* [DownloadController](#downloadcontroller)
@@ -223,6 +239,7 @@ Type: (`"gptj"` | `"llama"` | `"mpt"` | `"replit"`)
#### ModelFile
Full list of models available
@deprecated These model names are outdated and this type will not be maintained, please use a string literal instead
##### gptj
@@ -367,7 +384,7 @@ By default this will download a model from the official GPT4ALL website, if a mo
* `modelName` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The name of the model to load.
* `options` **(LoadModelOptions | [undefined](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/undefined))?** (Optional) Additional options for loading the model.
Returns **[Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)<[LLModel](#llmodel)>** A promise that resolves to an instance of the loaded LLModel.
Returns **[Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)<(InferenceModel | EmbeddingModel)>** A promise that resolves to an instance of the loaded LLModel.
#### createCompletion
@@ -375,25 +392,10 @@ The nodejs equivalent to python binding's chat\_completion
##### Parameters
* `llmodel` **[LLModel](#llmodel)** The language model object.
* `model` **InferenceModel** The language model object.
* `messages` **[Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Array)<[PromptMessage](#promptmessage)>** The array of messages for the conversation.
* `options` **[CompletionOptions](#completionoptions)** The options for creating the completion.
##### Examples
```javascript
const llmodel = new LLModel(model)
const messages = [
{ role: 'system', message: 'You are a weather forecaster.' },
{ role: 'user', message: 'should i go out today?' } ]
const completion = await createCompletion(llmodel, messages, {
verbose: true,
temp: 0.9,
})
console.log(completion.choices[0].message.content)
// No, it's going to be cold and rainy.
```
Returns **[CompletionReturn](#completionreturn)** The completion result.
#### createEmbedding
@@ -403,7 +405,7 @@ meow
##### Parameters
* `llmodel` **[LLModel](#llmodel)** The language model object.
* `model` **EmbeddingModel** The language model object.
* `text` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** text to embed
Returns **[Float32Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Float32Array)** The completion result.
@@ -420,17 +422,30 @@ Indicates if verbose logging is enabled.
Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
##### hasDefaultHeader
##### systemPromptTemplate
Indicates if the default header is included in the prompt.
Template for the system message. Will be put before the conversation with %1 being replaced by all system messages.
Note that if this is not defined, system messages will not be included in the prompt.
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
##### promptTemplate
Template for user messages, with %1 being replaced by the message.
Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
##### hasDefaultFooter
##### promptHeader
Indicates if the default footer is included in the prompt.
The initial instruction for the model, on top of the prompt
Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
##### promptFooter
The last instruction for the model, appended to the end of the prompt.
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
#### PromptMessage
@@ -472,9 +487,9 @@ The result of the completion, similar to OpenAI's format.
##### model
The model name.
The model used for the completion.
Type: [ModelFile](#modelfile)
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
##### usage
@@ -502,73 +517,100 @@ Type: [PromptMessage](#promptmessage)
Model inference arguments for generating completions.
##### logits\_size
##### logitsSize
The size of the raw logits vector.
Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
##### tokens\_size
##### tokensSize
The size of the raw tokens vector.
Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
##### n\_past
##### nPast
The number of tokens in the past conversation.
Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
##### n\_ctx
##### nCtx
The number of tokens possible in the context window.
Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
##### n\_predict
##### nPredict
The number of tokens to predict.
Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
##### top\_k
##### topK
The top-k logits to sample from.
Top-K sampling selects the next token only from the top K most likely tokens predicted by the model.
It helps reduce the risk of generating low-probability or nonsensical tokens, but it may also limit
the diversity of the output. A higher value for top-K (eg., 100) will consider more tokens and lead
to more diverse text, while a lower value (eg., 10) will focus on the most probable tokens and generate
more conservative text. 30 - 60 is a good range for most tasks.
Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
##### top\_p
##### topP
The nucleus sampling probability threshold.
Top-P limits the selection of the next token to a subset of tokens with a cumulative probability
above a threshold P. This method, also known as nucleus sampling, finds a balance between diversity
and quality by considering both token probabilities and the number of tokens available for sampling.
When using a higher value for top-P (eg., 0.95), the generated text becomes more diverse.
On the other hand, a lower value (eg., 0.1) produces more focused and conservative text.
The default value is 0.4, which is aimed to be the middle ground between focus and diversity, but
for more creative tasks a higher top-p value will be beneficial, about 0.5-0.9 is a good range for that.
Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
##### temp
The temperature to adjust the model's output distribution.
Temperature is like a knob that adjusts how creative or focused the output becomes. Higher temperatures
(eg., 1.2) increase randomness, resulting in more imaginative and diverse text. Lower temperatures (eg., 0.5)
make the output more focused, predictable, and conservative. When the temperature is set to 0, the output
becomes completely deterministic, always selecting the most probable next token and producing identical results
each time. A safe range would be around 0.6 - 0.85, but you are free to search what value fits best for you.
Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
##### n\_batch
##### nBatch
The number of predictions to generate in parallel.
By splitting the prompt every N tokens, prompt-batch-size reduces RAM usage during processing. However,
this can increase the processing time as a trade-off. If the N value is set too low (e.g., 10), long prompts
with 500+ tokens will be most affected, requiring numerous processing runs to complete the prompt processing.
To ensure optimal performance, setting the prompt-batch-size to 2048 allows processing of all tokens in a single run.
Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
##### repeat\_penalty
##### repeatPenalty
The penalty factor for repeated tokens.
Repeat-penalty can help penalize tokens based on how frequently they occur in the text, including the input prompt.
A token that has already appeared five times is penalized more heavily than a token that has appeared only one time.
A value of 1 means that there is no penalty and values larger than 1 discourage repeated tokens.
Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
##### repeat\_last\_n
##### repeatLastN
The number of last tokens to penalize.
The repeat-penalty-tokens N option controls the number of tokens in the history to consider for penalizing repetition.
A larger value will look further back in the generated text to prevent repetitions, while a smaller value will only
consider recent tokens.
Type: [number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)
##### context\_erase
##### contextErase
The percentage of context to erase if the context window is exceeded.
@@ -602,21 +644,39 @@ This searches DEFAULT\_DIRECTORY/libraries, cwd/libraries, and finally cwd.
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
#### DEFAULT\_MODEL\_CONFIG
Default model configuration.
Type: ModelConfig
#### DEFAULT\_PROMT\_CONTEXT
Default prompt context.
Type: [LLModelPromptContext](#llmodelpromptcontext)
#### DEFAULT\_MODEL\_LIST\_URL
Default model list url.
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
#### downloadModel
Initiates the download of a model file of a specific model type.
Initiates the download of a model file.
By default this downloads without waiting. use the controller returned to alter this behavior.
##### Parameters
* `modelName` **[ModelFile](#modelfile)** The model file to be downloaded.
* `options` **DownloadOptions** to pass into the downloader. Default is { location: (cwd), debug: false }.
* `modelName` **[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)** The model to be downloaded.
* `options` **DownloadOptions** to pass into the downloader. Default is { location: (cwd), verbose: false }.
##### Examples
```javascript
const controller = download('ggml-gpt4all-j-v1.3-groovy.bin')
controller.promise().then(() => console.log('Downloaded!'))
const download = downloadModel('ggml-gpt4all-j-v1.3-groovy.bin')
download.promise.then(() => console.log('Downloaded!'))
```
* Throws **[Error](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Error)** If the model already exists in the specified location.
@@ -635,7 +695,7 @@ Default is process.cwd(), or the current working directory
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
##### debug
##### verbose
Debug mode -- check how long it took to download in seconds
@@ -643,15 +703,16 @@ Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Glob
##### url
Remote download url. Defaults to `https://gpt4all.io/models`
Remote download url. Defaults to `https://gpt4all.io/models/<modelName>`
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
##### md5sum
Whether to verify the hash of the download to ensure a proper download occurred.
MD5 sum of the model file. If this is provided, the downloaded file will be checked against this sum.
If the sums do not match, an error will be thrown and the file will be deleted.
Type: [boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)
Type: [string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)
#### DownloadController
@@ -659,12 +720,12 @@ Model download controller.
##### cancel
Cancel the request to download from gpt4all website if this is called.
Cancel the request to download if this is called.
Type: function (): void
##### promise
Convert the downloader into a promise, allowing people to await and manage its lifetime
A promise resolving to the downloaded models config once the download is done
Type: function (): [Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)\<void>
Type: [Promise](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Promise)\<ModelConfig>