fix(chat-excel):Explicitly create data tables from the df (#2437) (#2464)

# Description

 fix:#2437 

Optimize the prompts for reconstructing data tables to ensure that the
output field names comply with SQL standards, avoiding field names that
start with numbers.

# How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide
instructions so we can reproduce. Please also list any relevant details
for your test configuration

# Snapshots:

Include snapshots for easier review.

# Checklist:

- [x] My code follows the style guidelines of this project
- [x] I have already rebased the commits and make the commit message
conform to the project standard.
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
- [x] Any dependent changes have been merged and published in downstream
modules
This commit is contained in:
Aries-ckt 2025-03-14 20:48:00 +08:00 committed by GitHub
commit d6eb283e41
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 11 additions and 5 deletions

View File

@ -52,9 +52,11 @@ with underscores
4. If it's in other languages, translate them to English, and replace spaces with \
underscores
5. If it's special characters, delete them directly
6. All column fields must be analyzed and converted, remember to output in JSON
6. DuckDB adheres to the SQL standard, which requires that identifiers \
(column names, table names) cannot start with a number.
7. All column fields must be analyzed and converted, remember to output in JSON
Avoid phrases like ' // ... (similar analysis for other columns) ...'
7. You need to provide the original column names and the transformed new column names \
8. You need to provide the original column names and the transformed new column names \
in the JSON, as well as your analysis of the meaning and function of that column. If \
it's a time type, please provide the time format, such as: \
yyyy-MM-dd HH:MM:ss
@ -111,9 +113,10 @@ DuckDB 表结构信息如下:
3. 如果是中文将中文字段名翻译为英文并且将空格替换为下划线
4. 如果是其它语言将其翻译为英文并且将空格替换为下划线
5. 如果是特殊字符直接删除
6. 所以列的字段都必须分析和转换切记在 JSON 中输出
6. DuckDB遵循SQL标准要求标识符(列名表名)不能以数字开头
7. 所以列的字段都必须分析和转换切记在 JSON 中输出
' // ... (其他列的类似分析) ...)' 之类的话术
7. 你需要在json中提供原始列名和转化后的新的列名以及你分析\
8. 你需要在json中提供原始列名和转化后的新的列名以及你分析\
的该列的含义和作用如果是时间类型请给出时间格式类似:\
yyyy-MM-dd HH:MM:ss

View File

@ -171,7 +171,10 @@ def read_from_df(
df = df.rename(columns=lambda x: x.strip().replace(" ", "_"))
# write data in duckdb
db.register(table_name, df)
db.register("temp_df_table", df)
# The table is explicitly created due to the issue at
# https://github.com/eosphoros-ai/DB-GPT/issues/2437.
db.execute(f"CREATE TABLE {table_name} AS SELECT * FROM temp_df_table")
return table_name