[evaluation] improvement on evaluation (#3862)

* fix a bug when the config file contains one category but the answer file doesn't contains that category * fix Chinese prompt file * support gpt-3.5-turbo and gpt-4 evaluation * polish and update README * resolve pr comments --------- Co-authored-by: Yuanchen Xu <yuanchen.xu00@gmail.com>
2025-12-23 12:36:03 +00:00 · 2023-05-30 11:48:41 +08:00
parent b0474878bf
commit 2506e275b8
7 changed files with 335 additions and 142 deletions
--- a/applications/Chat/evaluate/utils.py
+++ b/applications/Chat/evaluate/utils.py
@@ -57,6 +57,7 @@ def get_data_per_category(data, categories):
    data_per_category = {category: [] for category in categories}
    for item in data:
        category = item["category"]
-        data_per_category[category].append(item)
+        if category in categories:
+            data_per_category[category].append(item)

    return data_per_category