From 4dd5df1b6fd0104b46fa5e85e5300507007de44a Mon Sep 17 00:00:00 2001
From: Zach Nussbaum <zanussbaum@gmail.com>
Date: Thu, 13 Apr 2023 20:30:45 +0000
Subject: [PATCH] fix: format

---
 TRAINING_LOG.md | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/TRAINING_LOG.md b/TRAINING_LOG.md
index c59dccfd..f86838c2 100644
--- a/TRAINING_LOG.md
+++ b/TRAINING_LOG.md
@@ -241,7 +241,10 @@ We tried training a full model using the parameters above, but found that during
 
 ### Model Training Divergence
 
-We trained multiple [GPT-J models](https://huggingface.co/EleutherAI/gpt-j-6b) with varying success. We found that training the full model lead to diverged post epoch 1. ![](figs/overfit-gpt-j.png). We release the checkpoint after epoch 1.
+We trained multiple [GPT-J models](https://huggingface.co/EleutherAI/gpt-j-6b) with varying success. We found that training the full model lead to diverged post epoch 1. ![](figs/overfit-gpt-j.png)
+
+
+We release the checkpoint after epoch 1.
 
 
 Using Atlas, we extracted the embeddings of each point in the dataset and calculated the loss per sequence. We then uploaded [this to Atlas](https://atlas.nomic.ai/map/gpt4all-j-post-epoch-1-embeddings) and noticed that the higher loss items seem to cluster. On further inspection, the highest density clusters seemded to be of prompt/response pairs that asked for creative-like generations such as `Generate a story about ...` ![](figs/clustering_overfit.png)