mirror of
https://github.com/mudler/luet.git
synced 2025-07-12 06:38:16 +00:00
73 lines
2.3 KiB
Markdown
73 lines
2.3 KiB
Markdown
# qlearning
|
|
|
|
The qlearning package provides a series of interfaces and utilities to implement
|
|
the [Q-Learning](https://en.wikipedia.org/wiki/Q-learning) algorithm in
|
|
Go.
|
|
|
|
This project was largely inspired by [flappybird-qlearning-
|
|
bot](https://github.com/chncyhn/flappybird-qlearning-bot).
|
|
|
|
*Until a release is tagged, qlearning should be considered highly
|
|
experimental and mostly a fun toy.*
|
|
|
|
## Installation
|
|
|
|
```shell
|
|
$ go get https://github.com/ecooper/qlearning
|
|
```
|
|
|
|
## Quickstart
|
|
|
|
qlearning provides example implementations in the [examples](examples/)
|
|
directory of the project.
|
|
|
|
[hangman.go](examples/hangman.go) provides a naive implementation of
|
|
[Hangman](https://en.wikipedia.org/wiki/Hangman_(game)) for use with
|
|
qlearning.
|
|
|
|
```shell
|
|
$ cd $GOPATH/src/github.com/ecooper/qlearning/examples
|
|
$ go run hangman.go -h
|
|
Usage of hangman:
|
|
-debug
|
|
Set debug
|
|
-games int
|
|
Play N games (default 5000000)
|
|
-progress int
|
|
Print progress messages every N games (default 1000)
|
|
-wordlist string
|
|
Path to a wordlist (default "./wordlist.txt")
|
|
-words int
|
|
Use N words from wordlist (default 10000)
|
|
```
|
|
|
|
By default, running [hangman.go](examples/hangman.go) will play millions
|
|
of games against a 10,000-word corpus. That's a bit overkill for just
|
|
trying out qlearning. You can run it against a smaller number of words
|
|
for a few number of games using the `-games` and `-words` flags.
|
|
|
|
```shell
|
|
$ go run hangman.go -words 100 -progress 1000 -games 5000
|
|
100 words loaded
|
|
1000 games played: 92 WINS 908 LOSSES 9% WIN RATE
|
|
2000 games played: 447 WINS 1553 LOSSES 36% WIN RATE
|
|
3000 games played: 1064 WINS 1936 LOSSES 62% WIN RATE
|
|
4000 games played: 1913 WINS 2087 LOSSES 85% WIN RATE
|
|
5000 games played: 2845 WINS 2155 LOSSES 93% WIN RATE
|
|
|
|
Agent performance: 5000 games played, 2845 WINS 2155 LOSSES 57% WIN RATE
|
|
```
|
|
|
|
"WIN RATE" per progress report is isolated within that cycle, a group of
|
|
1000 games in this example. The win rate is meant to show the velocity
|
|
of learning by the agent. If it is "learning", the win rate should be
|
|
increasing until reaching convergence.
|
|
|
|
As you can see, after 5000 games, the agent is able to "learn" and play
|
|
hangman against a 100-word vocabulary.
|
|
|
|
## Usage
|
|
|
|
See [godocs](https://godoc.org/github.com/ecooper/qlearning) for the
|
|
package documentation.
|