bugfix(ChatExcel): ChatExcel Language confusion bug

1.Fix ChatExcel Language confusion bug
2025-09-25 11:39:11 +00:00 · 2023-11-09 14:52:11 +08:00
parent d609cccb83
commit 2b948c34a5
4 changed files with 12 additions and 8 deletions
--- a/pilot/scene/chat_data/chat_excel/excel_analyze/prompt.py
+++ b/pilot/scene/chat_data/chat_excel/excel_analyze/prompt.py
@@ -15,7 +15,7 @@ _DEFAULT_TEMPLATE_EN = """
 Please use the data structure information in the above historical dialogue and combine it with data analysis to answer the user's questions while satisfying the constraints.

 Constraint:
-    1.Please fully understand the user's problem and use duckdb sql for analysis. The analysis content is returned in the required output format. Do not output sql information outside the required location.
+    1.Please fully understand the user's problem and use duckdb sql for analysis. The analysis content is returned in the output format required below. Please output the sql in the corresponding sql parameter.
    2.Please choose the best one from the display methods given below for data rendering, and put the type name into the name parameter value that returns the required format. If you cannot find the most suitable one, use 'Table' as the display method. , the available data display methods are as follows: {disply_type}
    3.The table name that needs to be used in SQL is: {table_name}. Please check the sql you generated and do not use column names that are not in the data structure.
    4.Give priority to answering using data analysis. If the user's question does not involve data analysis, you can answer according to your understanding.
@@ -32,7 +32,7 @@ _PROMPT_SCENE_DEFINE_ZH = """你是一个数据分析专家！"""
 _DEFAULT_TEMPLATE_ZH = """
 请使用上述历史对话中的数据结构信息，在满足下面约束条件下通过数据分析回答用户的问题。
 约束条件:
-	1.请充分理解用户的问题，使用duckdb sql的方式进行分析， 分析内容按要求的输出格式返回，不要在要求的位置外输出sql信息 
+	1.请充分理解用户的问题，使用duckdb sql的方式进行分析， 分析内容按下面要求的输出格式返回，sql请输出在对应的sql参数中
 	2.请从如下给出的展示方式种选择最优的一种用以进行数据渲染，将类型名称放入返回要求格式的name参数值种，如果找不到最合适的则使用'Table'作为展示方式，可用数据展示方式如下: {disply_type}
 	3.SQL中需要使用的表名是: {table_name},请检查你生成的sql，不要使用没在数据结构中的列名，。
 	4.优先使用数据分析的方式回答，如果用户问题不涉及数据分析内容，你可以按你的理解进行回答
--- a/pilot/scene/chat_data/chat_excel/excel_learning/chat.py
+++ b/pilot/scene/chat_data/chat_excel/excel_learning/chat.py
@@ -51,10 +51,9 @@ class ExcelLearning(BaseChat):
            self._executor, self.excel_reader.get_sample_data
        )
        self.prompt_template.output_parser.update(colunms)
-        copy_datas = datas.copy()
        datas.insert(0, colunms)

        input_values = {
-            "data_example": json.dumps(copy_datas, cls=DateTimeEncoder),
+            "data_example": json.dumps(datas, cls=DateTimeEncoder),
        }
        return input_values
--- a/pilot/scene/chat_data/chat_excel/excel_learning/prompt.py
+++ b/pilot/scene/chat_data/chat_excel/excel_learning/prompt.py
@@ -14,7 +14,8 @@ _PROMPT_SCENE_DEFINE_EN = "You are a data analysis expert. "
 _DEFAULT_TEMPLATE_EN = """
 This is an example data，please learn to understand the structure and content of this data:
    {data_example}
-Explain the meaning and function of each column, and give a simple and clear explanation of the technical terms.  
+Explain the meaning and function of each column, and give a simple and clear explanation of the technical terms， If it is a Date column, please summarize the Date format like: yyyy-MM-dd HH:MM:ss.
+Please do not modify or translate the column names, make sure they are consistent with the given data column names.
 Provide some analysis options,please think step by step.

 Please return your answer in JSON format, the return format is as follows:
@@ -26,7 +27,9 @@ _PROMPT_SCENE_DEFINE_ZH = "你是一个数据分析专家. "
 _DEFAULT_TEMPLATE_ZH = """
 下面是一份示例数据，请学习理解该数据的结构和内容:
    {data_example}
-分析各列数据的含义和作用，并对专业术语进行简单明了的解释。
+分析各列数据的含义和作用，并对专业术语进行简单明了的解释, 如果是时间类型请给出时间格式类似:yyyy-MM-dd HH:MM:ss.
+请不要修改或者翻译列名，确保和给出数据列名一致.
+
 提供一些分析方案思路，请一步一步思考。

 请以JSON格式返回您的答案，返回格式如下：
--- a/pilot/scene/chat_data/chat_excel/excel_reader.py
+++ b/pilot/scene/chat_data/chat_excel/excel_reader.py
@@ -258,15 +258,17 @@ class ExcelReader:
        self.extension = os.path.splitext(file_name)[1]
        # read excel file
        if file_path.endswith(".xlsx") or file_path.endswith(".xls"):
-            df_tmp = pd.read_excel(file_path)
+            df_tmp = pd.read_excel(file_path, index_col= False)
            self.df = pd.read_excel(
                file_path,
+                index_col=False,
                converters={i: csv_colunm_foramt for i in range(df_tmp.shape[1])},
            )
        elif file_path.endswith(".csv"):
-            df_tmp = pd.read_csv(file_path, encoding=encoding)
+            df_tmp = pd.read_csv(file_path, index_col= False, encoding=encoding)
            self.df = pd.read_csv(
                file_path,
+                index_col=False,
                encoding=encoding,
                converters={i: csv_colunm_foramt for i in range(df_tmp.shape[1])},
            )