[Feature] Add document retrieval QA (#5020)

* add langchain

* add langchain

* Add files via upload

* add langchain

* fix style

* fix style: remove extra space

* add pytest; modified retriever

* add pytest; modified retriever

* add tests to build_on_pr.yml

* fix build_on_pr.yml

* fix build on pr; fix environ vars

* seperate unit tests for colossalqa from build from pr

* fix container setting; fix environ vars

* commented dev code

* add incremental update

* remove stale code

* fix style

* change to sha3 224

* fix retriever; fix style; add unit test for document loader

* fix ci workflow config

* fix ci workflow config

* add set cuda visible device script in ci

* fix doc string

* fix style; update readme; refactored

* add force log info

* change build on pr, ignore colossalqa

* fix docstring, captitalize all initial letters

* fix indexing; fix text-splitter

* remove debug code, update reference

* reset previous commit

* update LICENSE update README add key-value mode, fix bugs

* add files back

* revert force push

* remove junk file

* add test files

* fix retriever bug, add intent classification

* change conversation chain design

* rewrite prompt and conversation chain

* add ui v1

* ui v1

* fix atavar

* add header

* Refactor the RAG Code and support Pangu

* Refactor the ColossalQA chain to Object-Oriented Programming and the UI demo.

* resolved conversation. tested scripts under examples. web demo still buggy

* fix ci tests

* Some modifications to add ChatGPT api

* modify llm.py and remove unnecessary files

* Delete applications/ColossalQA/examples/ui/test_frontend_input.json

* Remove OpenAI api key

* add colossalqa

* move files

* move files

* move files

* move files

* fix style

* Add Readme and fix some bugs.

* Add something to readme and modify some code

* modify a directory name for clarity

* remove redundant directory

* Correct a type in  llm.py

* fix AI prefix

* fix test_memory.py

* fix conversation

* fix some erros and typos

* Fix a missing import in RAG_ChatBot.py

* add colossalcloud LLM wrapper, correct issues in code review

---------

Co-authored-by: YeAnbang <anbangy2@outlook.com>
Co-authored-by: Orion-Zheng <zheng_zian@u.nus.edu>
Co-authored-by: Zian(Andy) Zheng <62330719+Orion-Zheng@users.noreply.github.com>
Co-authored-by: Orion-Zheng <zhengzian@u.nus.edu>
This commit is contained in:
YeAnbang
2023-11-23 10:33:48 +08:00
committed by GitHub
parent 3acbf6d496
commit e53e729d8e
69 changed files with 6758 additions and 0 deletions

View File

@@ -0,0 +1,7 @@
{
"data":[
{"content":"Donec lobortis eleifend condimentum. Cras dictum dolor lacinia lectus vehicula rutrum. Maecenas quis nisi nunc. Nam tristique feugiat est vitae mollis. Maecenas quis nisi nunc."},
{"content":"Aliquam sollicitudin ante ligula, eget malesuada nibh efficitur et. Pellentesque massa sem, scelerisque sit amet odio id, cursus tempor urna. Etiam congue dignissim volutpat. Vestibulum pharetra libero et velit gravida euismod."}
],
"name":"player"
}

View File

@@ -0,0 +1,101 @@
Index,Organization Id,Name,Website,Country,Description,Founded,Industry,Number of employees
1,FAB0d41d5b5d22c,Ferrell LLC,https://price.net/,Papua New Guinea,Horizontal empowering knowledgebase,1990,Plastics,3498
2,6A7EdDEA9FaDC52,"Mckinney, Riley and Day",http://www.hall-buchanan.info/,Finland,User-centric system-worthy leverage,2015,Glass / Ceramics / Concrete,4952
3,0bFED1ADAE4bcC1,Hester Ltd,http://sullivan-reed.com/,China,Switchable scalable moratorium,1971,Public Safety,5287
4,2bFC1Be8a4ce42f,Holder-Sellers,https://becker.com/,Turkmenistan,De-engineered systemic artificial intelligence,2004,Automotive,921
5,9eE8A6a4Eb96C24,Mayer Group,http://www.brewer.com/,Mauritius,Synchronized needs-based challenge,1991,Transportation,7870
6,cC757116fe1C085,Henry-Thompson,http://morse.net/,Bahamas,Face-to-face well-modulated customer loyalty,1992,Primary / Secondary Education,4914
7,219233e8aFF1BC3,Hansen-Everett,https://www.kidd.org/,Pakistan,Seamless disintermediate collaboration,2018,Publishing Industry,7832
8,ccc93DCF81a31CD,Mcintosh-Mora,https://www.brooks.com/,Heard Island and McDonald Islands,Centralized attitude-oriented capability,1970,Import / Export,4389
9,0B4F93aA06ED03e,Carr Inc,http://ross.com/,Kuwait,Distributed impactful customer loyalty,1996,Plastics,8167
10,738b5aDe6B1C6A5,Gaines Inc,http://sandoval-hooper.com/,Uzbekistan,Multi-lateral scalable protocol,1997,Outsourcing / Offshoring,9698
11,AE61b8Ffebbc476,Kidd Group,http://www.lyons.com/,Bouvet Island (Bouvetoya),Proactive foreground paradigm,2001,Primary / Secondary Education,7473
12,eb3B7D06cCdD609,Crane-Clarke,https://www.sandoval.com/,Denmark,Front-line clear-thinking encryption,2014,Food / Beverages,9011
13,8D0c29189C9798B,"Keller, Campos and Black",https://www.garner.info/,Liberia,Ameliorated directional emulation,2020,Museums / Institutions,2862
14,D2c91cc03CA394c,Glover-Pope,http://www.silva.biz/,United Arab Emirates,Persevering contextually-based approach,2013,Medical Practice,9079
15,C8AC1eaf9C036F4,Pacheco-Spears,https://aguilar.com/,Sweden,Secured logistical synergy,1984,Maritime,769
16,b5D10A14f7a8AfE,Hodge-Ayers,http://www.archer-elliott.com/,Honduras,Future-proofed radical implementation,1990,Facilities Services,8508
17,68139b5C4De03B4,"Bowers, Guerra and Krause",http://www.carrillo-nicholson.com/,Uganda,De-engineered transitional strategy,1972,Primary / Secondary Education,6986
18,5c2EffEfdba2BdF,Mckenzie-Melton,http://montoya-thompson.com/,Hong Kong,Reverse-engineered heuristic alliance,1998,Investment Management / Hedge Fund / Private Equity,4589
19,ba179F19F7925f5,Branch-Mann,http://www.lozano.com/,Botswana,Adaptive intangible frame,1999,Architecture / Planning,7961
20,c1Ce9B350BAc66b,Weiss and Sons,https://barrett.com/,Korea,Sharable optimal functionalities,2011,Plastics,5984
21,8de40AC4e6EaCa4,"Velez, Payne and Coffey",http://burton.com/,Luxembourg,Mandatory coherent synergy,1986,Wholesale,5010
22,Aad86a4F0385F2d,Harrell LLC,http://www.frey-rosario.com/,Guadeloupe,Reverse-engineered mission-critical moratorium,2018,Construction,2185
23,22aC3FFd64fD703,"Eaton, Reynolds and Vargas",http://www.freeman.biz/,Monaco,Self-enabling multi-tasking process improvement,2014,Luxury Goods / Jewelry,8987
24,5Ec4C272bCf085c,Robbins-Cummings,http://donaldson-wilkins.com/,Belgium,Organic non-volatile hierarchy,1991,Pharmaceuticals,5038
25,5fDBeA8BB91a000,Jenkins Inc,http://www.kirk.biz/,South Africa,Front-line systematic help-desk,2002,Insurance,1215
26,dFfD6a6F9AC2d9C,"Greene, Benjamin and Novak",http://www.kent.net/,Romania,Centralized leadingedge moratorium,2012,Museums / Institutions,4941
27,4B217cC5a0674C5,"Dickson, Richmond and Clay",http://everett.com/,Czech Republic,Team-oriented tangible complexity,1980,Real Estate / Mortgage,3122
28,88b1f1cDcf59a37,Prince-David,http://thompson.com/,Christmas Island,Virtual holistic methodology,1970,Banking / Mortgage,1046
29,f9F7bBCAEeC360F,Ayala LLC,http://www.zhang.com/,Philippines,Open-source zero administration hierarchy,2021,Legal Services,7664
30,7Cb3AeFcE4Ba31e,Rivas Group,https://hebert.org/,Australia,Open-architected well-modulated capacity,1998,Logistics / Procurement,4155
31,ccBcC32adcbc530,"Sloan, Mays and Whitehead",http://lawson.com/,Chad,Face-to-face high-level conglomeration,1997,Civil Engineering,365
32,f5afd686b3d05F5,"Durham, Allen and Barnes",http://chan-stafford.org/,Zimbabwe,Synergistic web-enabled framework,1993,Mechanical or Industrial Engineering,6135
33,38C6cfC5074Fa5e,Fritz-Franklin,http://www.lambert.com/,Nepal,Automated 4thgeneration website,1972,Hospitality,4516
34,5Cd7efccCcba38f,Burch-Ewing,http://cline.net/,Taiwan,User-centric 4thgeneration system engine,1981,Venture Capital / VC,7443
35,9E6Acb51e3F9d6F,"Glass, Barrera and Turner",https://dunlap.com/,Kyrgyz Republic,Multi-channeled 3rdgeneration open system,2020,Utilities,2610
36,4D4d7E18321eaeC,Pineda-Cox,http://aguilar.org/,Bolivia,Fundamental asynchronous capability,2010,Human Resources / HR,1312
37,485f5d06B938F2b,"Baker, Mccann and Macdonald",http://www.anderson-barker.com/,Kenya,Cross-group user-facing focus group,2013,Legislative Office,1638
38,19E3a5Bf6dBDc4F,Cuevas-Moss,https://dodson-castaneda.net/,Guatemala,Extended human-resource intranet,1994,Music,9995
39,6883A965c7b68F7,Hahn PLC,http://newman.com/,Belarus,Organic logistical leverage,2012,Electrical / Electronic Manufacturing,3715
40,AC5B7AA74Aa4A2E,"Valentine, Ferguson and Kramer",http://stuart.net/,Jersey,Centralized secondary time-frame,1997,Non - Profit / Volunteering,3585
41,decab0D5027CA6a,Arroyo Inc,https://www.turner.com/,Grenada,Managed demand-driven website,2006,Writing / Editing,9067
42,dF084FbBb613eea,Walls LLC,http://www.reese-vasquez.biz/,Cape Verde,Self-enabling fresh-thinking installation,1989,Investment Management / Hedge Fund / Private Equity,1678
43,A2D89Ab9bCcAd4e,"Mitchell, Warren and Schneider",https://fox.biz/,Trinidad and Tobago,Enhanced intangible time-frame,2021,Capital Markets / Hedge Fund / Private Equity,3816
44,77aDc905434a49f,Prince PLC,https://www.watts.com/,Sweden,Profit-focused coherent installation,2016,Individual / Family Services,7645
45,235fdEFE2cfDa5F,Brock-Blackwell,http://www.small.com/,Benin,Secured foreground emulation,1986,Online Publishing,7034
46,1eD64cFe986BBbE,Walton-Barnett,https://ashley-schaefer.com/,Western Sahara,Right-sized clear-thinking flexibility,2001,Luxury Goods / Jewelry,1746
47,CbBbFcdd0eaE2cF,Bartlett-Arroyo,https://cruz.com/,Northern Mariana Islands,Realigned didactic function,1976,Civic / Social Organization,3987
48,49aECbDaE6aBD53,"Wallace, Madden and Morris",http://www.blevins-fernandez.biz/,Germany,Persistent real-time customer loyalty,2016,Pharmaceuticals,9443
49,7b3fe6e7E72bFa4,Berg-Sparks,https://cisneros-love.com/,Canada,Stand-alone static implementation,1974,Arts / Crafts,2073
50,c6DedA82A8aef7E,Gonzales Ltd,http://bird.com/,Tonga,Managed human-resource policy,1988,Consumer Goods,9069
51,7D9FBF85cdC3871,Lawson and Sons,https://www.wong.com/,French Southern Territories,Compatible analyzing intranet,2021,Arts / Crafts,3527
52,7dd18Fb7cB07b65,"Mcguire, Mcconnell and Olsen",https://melton-briggs.com/,Korea,Profound client-server frame,1988,Printing,8445
53,EF5B55FadccB8Fe,Charles-Phillips,https://bowman.com/,Cote d'Ivoire,Monitored client-server implementation,2012,Mental Health Care,3450
54,f8D4B99e11fAF5D,Odom Ltd,https://www.humphrey-hess.com/,Cote d'Ivoire,Advanced static process improvement,2012,Management Consulting,1825
55,e24D21BFd3bF1E5,Richard PLC,https://holden-coleman.net/,Mayotte,Object-based optimizing model,1971,Broadcast Media,4942
56,B9BdfEB6D3Ca44E,Sampson Ltd,https://blevins.com/,Cayman Islands,Intuitive local adapter,2005,Farming,1418
57,2a74D6f3D3B268e,"Cherry, Le and Callahan",https://waller-delacruz.biz/,Nigeria,Universal human-resource collaboration,2017,Entertainment / Movie Production,7202
58,Bf3F3f62c8aBC33,Cherry PLC,https://www.avila.info/,Marshall Islands,Persistent tertiary website,1980,Plastics,8245
59,aeBe26B80a7a23c,Melton-Nichols,https://kennedy.com/,Palau,User-friendly clear-thinking productivity,2021,Legislative Office,8741
60,aAeb29ad43886C6,Potter-Walsh,http://thomas-french.org/,Turkey,Optional non-volatile open system,2008,Human Resources / HR,6923
61,bD1bc6bB6d1FeD3,Freeman-Chen,https://mathis.com/,Timor-Leste,Phased next generation adapter,1973,International Trade / Development,346
62,EB9f456e8b7022a,Soto Group,https://norris.info/,Vietnam,Enterprise-wide executive installation,1988,Business Supplies / Equipment,9097
63,Dfef38C51D8DAe3,"Poole, Cruz and Whitney",https://reed.info/,Reunion,Balanced analyzing groupware,1978,Marketing / Advertising / Sales,2992
64,055ffEfB2Dd95B0,Riley Ltd,http://wiley.com/,Brazil,Optional exuding superstructure,1986,Textiles,9315
65,cBfe4dbAE1699da,"Erickson, Andrews and Bailey",https://www.hobbs-grant.com/,Eritrea,Vision-oriented secondary project,2014,Consumer Electronics,7829
66,fdFbecbadcdCdf1,"Wilkinson, Charles and Arroyo",http://hunter-mcfarland.com/,United States Virgin Islands,Assimilated 24/7 archive,1996,Building Materials,602
67,5DCb8A5a5ca03c0,Floyd Ltd,http://www.whitney.com/,Falkland Islands (Malvinas),Function-based fault-tolerant concept,2017,Public Relations / PR,2911
68,ce57DCbcFD6d618,Newman-Galloway,https://www.scott.com/,Luxembourg,Enhanced foreground collaboration,1987,Information Technology / IT,3934
69,5aaD187dc929371,Frazier-Butler,https://www.daugherty-farley.info/,Northern Mariana Islands,Persistent interactive circuit,1972,Outsourcing / Offshoring,5130
70,902D7Ac8b6d476b,Newton Inc,https://www.richmond-manning.info/,Netherlands Antilles,Fundamental stable info-mediaries,1976,Military Industry,563
71,32BB9Ff4d939788,Duffy-Levy,https://www.potter.com/,Guernsey,Diverse exuding installation,1982,Wireless,6146
72,adcB0afbE58bAe3,Wagner LLC,https://decker-esparza.com/,Uruguay,Reactive attitude-oriented toolset,1987,International Affairs,6874
73,dfcA1c84AdB61Ac,Mccall-Holmes,http://www.dean.com/,Benin,Object-based value-added database,2009,Legal Services,696
74,208044AC2fe52F3,Massey LLC,https://frazier.biz/,Suriname,Configurable zero administration Graphical User Interface,1986,Accounting,5004
75,f3C365f0c1A0623,Hicks LLC,http://alvarez.biz/,Pakistan,Quality-focused client-server Graphical User Interface,1970,Computer Software / Engineering,8480
76,ec5Bdd3CBAfaB93,"Cole, Russell and Avery",http://www.blankenship.com/,Mongolia,De-engineered fault-tolerant challenge,2000,Law Enforcement,7012
77,DDB19Be7eeB56B4,Cummings-Rojas,https://simon-pearson.com/,Svalbard & Jan Mayen Islands,User-centric modular customer loyalty,2012,Financial Services,7529
78,dd6CA3d0bc3cAfc,"Beasley, Greene and Mahoney",http://www.petersen-lawrence.com/,Togo,Extended content-based methodology,1976,Religious Institutions,869
79,A0B9d56e61070e3,"Beasley, Sims and Allison",http://burke.info/,Latvia,Secured zero tolerance hub,1972,Facilities Services,6182
80,cBa7EFe5D05Adaf,Crawford-Rivera,https://black-ramirez.org/,Cuba,Persevering exuding budgetary management,1999,Online Publishing,7805
81,Ea3f6D52Ec73563,Montes-Hensley,https://krueger.org/,Liechtenstein,Multi-tiered secondary productivity,2009,Printing,8433
82,bC0CEd48A8000E0,Velazquez-Odom,https://stokes.com/,Djibouti,Streamlined 6thgeneration function,2002,Alternative Dispute Resolution,4044
83,c89b9b59BC4baa1,Eaton-Morales,https://www.reeves-graham.com/,Micronesia,Customer-focused explicit frame,1990,Capital Markets / Hedge Fund / Private Equity,7013
84,FEC51bce8421a7b,"Roberson, Pennington and Palmer",http://www.keith-fisher.com/,Cameroon,Adaptive bi-directional hierarchy,1993,Telecommunications,5571
85,e0E8e27eAc9CAd5,"George, Russo and Guerra",https://drake.com/,Sweden,Centralized non-volatile capability,1989,Military Industry,2880
86,B97a6CF9bf5983C,Davila Inc,https://mcconnell.info/,Cocos (Keeling) Islands,Profit-focused dedicated frame,2017,Consumer Electronics,2215
87,a0a6f9b3DbcBEb5,Mays-Preston,http://www.browning-key.com/,Mali,User-centric heuristic focus group,2006,Military Industry,5786
88,8cC1bDa330a5871,Pineda-Morton,https://www.carr.com/,United States Virgin Islands,Grass-roots methodical info-mediaries,1991,Printing,6168
89,ED889CB2FE9cbd3,Huang and Sons,https://www.bolton.com/,Eritrea,Re-contextualized dynamic hierarchy,1981,Semiconductors,7484
90,F4Dc1417BC6cb8f,Gilbert-Simon,https://www.bradford.biz/,Burundi,Grass-roots radical parallelism,1973,Newspapers / Journalism,1927
91,7ABc3c7ecA03B34,Sampson-Griffith,http://hendricks.org/,Benin,Multi-layered composite paradigm,1972,Textiles,3881
92,4e0719FBE38e0aB,Miles-Dominguez,http://www.turner.com/,Gibraltar,Organized empowering forecast,1996,Civic / Social Organization,897
93,dEbDAAeDfaed00A,Rowe and Sons,https://www.simpson.org/,El Salvador,Balanced multimedia knowledgebase,1978,Facilities Services,8172
94,61BDeCfeFD0cEF5,"Valenzuela, Holmes and Rowland",https://www.dorsey.net/,Taiwan,Persistent tertiary focus group,1999,Transportation,1483
95,4e91eD25f486110,"Best, Wade and Shepard",https://zimmerman.com/,Zimbabwe,Innovative background definition,1991,Gambling / Casinos,4873
96,0a0bfFbBbB8eC7c,Holmes Group,https://mcdowell.org/,Ethiopia,Right-sized zero tolerance focus group,1975,Photography,2988
97,BA6Cd9Dae2Efd62,Good Ltd,http://duffy.com/,Anguilla,Reverse-engineered composite moratorium,1971,Consumer Services,4292
98,E7df80C60Abd7f9,Clements-Espinoza,http://www.flowers.net/,Falkland Islands (Malvinas),Progressive modular hub,1991,Broadcast Media,236
99,AFc285dbE2fEd24,Mendez Inc,https://www.burke.net/,Kyrgyz Republic,User-friendly exuding migration,1993,Education Management,339
100,e9eB5A60Cef8354,Watkins-Kaiser,http://www.herring.com/,Togo,Synergistic background access,2009,Financial Services,2785
1 Index Organization Id Name Website Country Description Founded Industry Number of employees
2 1 FAB0d41d5b5d22c Ferrell LLC https://price.net/ Papua New Guinea Horizontal empowering knowledgebase 1990 Plastics 3498
3 2 6A7EdDEA9FaDC52 Mckinney, Riley and Day http://www.hall-buchanan.info/ Finland User-centric system-worthy leverage 2015 Glass / Ceramics / Concrete 4952
4 3 0bFED1ADAE4bcC1 Hester Ltd http://sullivan-reed.com/ China Switchable scalable moratorium 1971 Public Safety 5287
5 4 2bFC1Be8a4ce42f Holder-Sellers https://becker.com/ Turkmenistan De-engineered systemic artificial intelligence 2004 Automotive 921
6 5 9eE8A6a4Eb96C24 Mayer Group http://www.brewer.com/ Mauritius Synchronized needs-based challenge 1991 Transportation 7870
7 6 cC757116fe1C085 Henry-Thompson http://morse.net/ Bahamas Face-to-face well-modulated customer loyalty 1992 Primary / Secondary Education 4914
8 7 219233e8aFF1BC3 Hansen-Everett https://www.kidd.org/ Pakistan Seamless disintermediate collaboration 2018 Publishing Industry 7832
9 8 ccc93DCF81a31CD Mcintosh-Mora https://www.brooks.com/ Heard Island and McDonald Islands Centralized attitude-oriented capability 1970 Import / Export 4389
10 9 0B4F93aA06ED03e Carr Inc http://ross.com/ Kuwait Distributed impactful customer loyalty 1996 Plastics 8167
11 10 738b5aDe6B1C6A5 Gaines Inc http://sandoval-hooper.com/ Uzbekistan Multi-lateral scalable protocol 1997 Outsourcing / Offshoring 9698
12 11 AE61b8Ffebbc476 Kidd Group http://www.lyons.com/ Bouvet Island (Bouvetoya) Proactive foreground paradigm 2001 Primary / Secondary Education 7473
13 12 eb3B7D06cCdD609 Crane-Clarke https://www.sandoval.com/ Denmark Front-line clear-thinking encryption 2014 Food / Beverages 9011
14 13 8D0c29189C9798B Keller, Campos and Black https://www.garner.info/ Liberia Ameliorated directional emulation 2020 Museums / Institutions 2862
15 14 D2c91cc03CA394c Glover-Pope http://www.silva.biz/ United Arab Emirates Persevering contextually-based approach 2013 Medical Practice 9079
16 15 C8AC1eaf9C036F4 Pacheco-Spears https://aguilar.com/ Sweden Secured logistical synergy 1984 Maritime 769
17 16 b5D10A14f7a8AfE Hodge-Ayers http://www.archer-elliott.com/ Honduras Future-proofed radical implementation 1990 Facilities Services 8508
18 17 68139b5C4De03B4 Bowers, Guerra and Krause http://www.carrillo-nicholson.com/ Uganda De-engineered transitional strategy 1972 Primary / Secondary Education 6986
19 18 5c2EffEfdba2BdF Mckenzie-Melton http://montoya-thompson.com/ Hong Kong Reverse-engineered heuristic alliance 1998 Investment Management / Hedge Fund / Private Equity 4589
20 19 ba179F19F7925f5 Branch-Mann http://www.lozano.com/ Botswana Adaptive intangible frame 1999 Architecture / Planning 7961
21 20 c1Ce9B350BAc66b Weiss and Sons https://barrett.com/ Korea Sharable optimal functionalities 2011 Plastics 5984
22 21 8de40AC4e6EaCa4 Velez, Payne and Coffey http://burton.com/ Luxembourg Mandatory coherent synergy 1986 Wholesale 5010
23 22 Aad86a4F0385F2d Harrell LLC http://www.frey-rosario.com/ Guadeloupe Reverse-engineered mission-critical moratorium 2018 Construction 2185
24 23 22aC3FFd64fD703 Eaton, Reynolds and Vargas http://www.freeman.biz/ Monaco Self-enabling multi-tasking process improvement 2014 Luxury Goods / Jewelry 8987
25 24 5Ec4C272bCf085c Robbins-Cummings http://donaldson-wilkins.com/ Belgium Organic non-volatile hierarchy 1991 Pharmaceuticals 5038
26 25 5fDBeA8BB91a000 Jenkins Inc http://www.kirk.biz/ South Africa Front-line systematic help-desk 2002 Insurance 1215
27 26 dFfD6a6F9AC2d9C Greene, Benjamin and Novak http://www.kent.net/ Romania Centralized leadingedge moratorium 2012 Museums / Institutions 4941
28 27 4B217cC5a0674C5 Dickson, Richmond and Clay http://everett.com/ Czech Republic Team-oriented tangible complexity 1980 Real Estate / Mortgage 3122
29 28 88b1f1cDcf59a37 Prince-David http://thompson.com/ Christmas Island Virtual holistic methodology 1970 Banking / Mortgage 1046
30 29 f9F7bBCAEeC360F Ayala LLC http://www.zhang.com/ Philippines Open-source zero administration hierarchy 2021 Legal Services 7664
31 30 7Cb3AeFcE4Ba31e Rivas Group https://hebert.org/ Australia Open-architected well-modulated capacity 1998 Logistics / Procurement 4155
32 31 ccBcC32adcbc530 Sloan, Mays and Whitehead http://lawson.com/ Chad Face-to-face high-level conglomeration 1997 Civil Engineering 365
33 32 f5afd686b3d05F5 Durham, Allen and Barnes http://chan-stafford.org/ Zimbabwe Synergistic web-enabled framework 1993 Mechanical or Industrial Engineering 6135
34 33 38C6cfC5074Fa5e Fritz-Franklin http://www.lambert.com/ Nepal Automated 4thgeneration website 1972 Hospitality 4516
35 34 5Cd7efccCcba38f Burch-Ewing http://cline.net/ Taiwan User-centric 4thgeneration system engine 1981 Venture Capital / VC 7443
36 35 9E6Acb51e3F9d6F Glass, Barrera and Turner https://dunlap.com/ Kyrgyz Republic Multi-channeled 3rdgeneration open system 2020 Utilities 2610
37 36 4D4d7E18321eaeC Pineda-Cox http://aguilar.org/ Bolivia Fundamental asynchronous capability 2010 Human Resources / HR 1312
38 37 485f5d06B938F2b Baker, Mccann and Macdonald http://www.anderson-barker.com/ Kenya Cross-group user-facing focus group 2013 Legislative Office 1638
39 38 19E3a5Bf6dBDc4F Cuevas-Moss https://dodson-castaneda.net/ Guatemala Extended human-resource intranet 1994 Music 9995
40 39 6883A965c7b68F7 Hahn PLC http://newman.com/ Belarus Organic logistical leverage 2012 Electrical / Electronic Manufacturing 3715
41 40 AC5B7AA74Aa4A2E Valentine, Ferguson and Kramer http://stuart.net/ Jersey Centralized secondary time-frame 1997 Non - Profit / Volunteering 3585
42 41 decab0D5027CA6a Arroyo Inc https://www.turner.com/ Grenada Managed demand-driven website 2006 Writing / Editing 9067
43 42 dF084FbBb613eea Walls LLC http://www.reese-vasquez.biz/ Cape Verde Self-enabling fresh-thinking installation 1989 Investment Management / Hedge Fund / Private Equity 1678
44 43 A2D89Ab9bCcAd4e Mitchell, Warren and Schneider https://fox.biz/ Trinidad and Tobago Enhanced intangible time-frame 2021 Capital Markets / Hedge Fund / Private Equity 3816
45 44 77aDc905434a49f Prince PLC https://www.watts.com/ Sweden Profit-focused coherent installation 2016 Individual / Family Services 7645
46 45 235fdEFE2cfDa5F Brock-Blackwell http://www.small.com/ Benin Secured foreground emulation 1986 Online Publishing 7034
47 46 1eD64cFe986BBbE Walton-Barnett https://ashley-schaefer.com/ Western Sahara Right-sized clear-thinking flexibility 2001 Luxury Goods / Jewelry 1746
48 47 CbBbFcdd0eaE2cF Bartlett-Arroyo https://cruz.com/ Northern Mariana Islands Realigned didactic function 1976 Civic / Social Organization 3987
49 48 49aECbDaE6aBD53 Wallace, Madden and Morris http://www.blevins-fernandez.biz/ Germany Persistent real-time customer loyalty 2016 Pharmaceuticals 9443
50 49 7b3fe6e7E72bFa4 Berg-Sparks https://cisneros-love.com/ Canada Stand-alone static implementation 1974 Arts / Crafts 2073
51 50 c6DedA82A8aef7E Gonzales Ltd http://bird.com/ Tonga Managed human-resource policy 1988 Consumer Goods 9069
52 51 7D9FBF85cdC3871 Lawson and Sons https://www.wong.com/ French Southern Territories Compatible analyzing intranet 2021 Arts / Crafts 3527
53 52 7dd18Fb7cB07b65 Mcguire, Mcconnell and Olsen https://melton-briggs.com/ Korea Profound client-server frame 1988 Printing 8445
54 53 EF5B55FadccB8Fe Charles-Phillips https://bowman.com/ Cote d'Ivoire Monitored client-server implementation 2012 Mental Health Care 3450
55 54 f8D4B99e11fAF5D Odom Ltd https://www.humphrey-hess.com/ Cote d'Ivoire Advanced static process improvement 2012 Management Consulting 1825
56 55 e24D21BFd3bF1E5 Richard PLC https://holden-coleman.net/ Mayotte Object-based optimizing model 1971 Broadcast Media 4942
57 56 B9BdfEB6D3Ca44E Sampson Ltd https://blevins.com/ Cayman Islands Intuitive local adapter 2005 Farming 1418
58 57 2a74D6f3D3B268e Cherry, Le and Callahan https://waller-delacruz.biz/ Nigeria Universal human-resource collaboration 2017 Entertainment / Movie Production 7202
59 58 Bf3F3f62c8aBC33 Cherry PLC https://www.avila.info/ Marshall Islands Persistent tertiary website 1980 Plastics 8245
60 59 aeBe26B80a7a23c Melton-Nichols https://kennedy.com/ Palau User-friendly clear-thinking productivity 2021 Legislative Office 8741
61 60 aAeb29ad43886C6 Potter-Walsh http://thomas-french.org/ Turkey Optional non-volatile open system 2008 Human Resources / HR 6923
62 61 bD1bc6bB6d1FeD3 Freeman-Chen https://mathis.com/ Timor-Leste Phased next generation adapter 1973 International Trade / Development 346
63 62 EB9f456e8b7022a Soto Group https://norris.info/ Vietnam Enterprise-wide executive installation 1988 Business Supplies / Equipment 9097
64 63 Dfef38C51D8DAe3 Poole, Cruz and Whitney https://reed.info/ Reunion Balanced analyzing groupware 1978 Marketing / Advertising / Sales 2992
65 64 055ffEfB2Dd95B0 Riley Ltd http://wiley.com/ Brazil Optional exuding superstructure 1986 Textiles 9315
66 65 cBfe4dbAE1699da Erickson, Andrews and Bailey https://www.hobbs-grant.com/ Eritrea Vision-oriented secondary project 2014 Consumer Electronics 7829
67 66 fdFbecbadcdCdf1 Wilkinson, Charles and Arroyo http://hunter-mcfarland.com/ United States Virgin Islands Assimilated 24/7 archive 1996 Building Materials 602
68 67 5DCb8A5a5ca03c0 Floyd Ltd http://www.whitney.com/ Falkland Islands (Malvinas) Function-based fault-tolerant concept 2017 Public Relations / PR 2911
69 68 ce57DCbcFD6d618 Newman-Galloway https://www.scott.com/ Luxembourg Enhanced foreground collaboration 1987 Information Technology / IT 3934
70 69 5aaD187dc929371 Frazier-Butler https://www.daugherty-farley.info/ Northern Mariana Islands Persistent interactive circuit 1972 Outsourcing / Offshoring 5130
71 70 902D7Ac8b6d476b Newton Inc https://www.richmond-manning.info/ Netherlands Antilles Fundamental stable info-mediaries 1976 Military Industry 563
72 71 32BB9Ff4d939788 Duffy-Levy https://www.potter.com/ Guernsey Diverse exuding installation 1982 Wireless 6146
73 72 adcB0afbE58bAe3 Wagner LLC https://decker-esparza.com/ Uruguay Reactive attitude-oriented toolset 1987 International Affairs 6874
74 73 dfcA1c84AdB61Ac Mccall-Holmes http://www.dean.com/ Benin Object-based value-added database 2009 Legal Services 696
75 74 208044AC2fe52F3 Massey LLC https://frazier.biz/ Suriname Configurable zero administration Graphical User Interface 1986 Accounting 5004
76 75 f3C365f0c1A0623 Hicks LLC http://alvarez.biz/ Pakistan Quality-focused client-server Graphical User Interface 1970 Computer Software / Engineering 8480
77 76 ec5Bdd3CBAfaB93 Cole, Russell and Avery http://www.blankenship.com/ Mongolia De-engineered fault-tolerant challenge 2000 Law Enforcement 7012
78 77 DDB19Be7eeB56B4 Cummings-Rojas https://simon-pearson.com/ Svalbard & Jan Mayen Islands User-centric modular customer loyalty 2012 Financial Services 7529
79 78 dd6CA3d0bc3cAfc Beasley, Greene and Mahoney http://www.petersen-lawrence.com/ Togo Extended content-based methodology 1976 Religious Institutions 869
80 79 A0B9d56e61070e3 Beasley, Sims and Allison http://burke.info/ Latvia Secured zero tolerance hub 1972 Facilities Services 6182
81 80 cBa7EFe5D05Adaf Crawford-Rivera https://black-ramirez.org/ Cuba Persevering exuding budgetary management 1999 Online Publishing 7805
82 81 Ea3f6D52Ec73563 Montes-Hensley https://krueger.org/ Liechtenstein Multi-tiered secondary productivity 2009 Printing 8433
83 82 bC0CEd48A8000E0 Velazquez-Odom https://stokes.com/ Djibouti Streamlined 6thgeneration function 2002 Alternative Dispute Resolution 4044
84 83 c89b9b59BC4baa1 Eaton-Morales https://www.reeves-graham.com/ Micronesia Customer-focused explicit frame 1990 Capital Markets / Hedge Fund / Private Equity 7013
85 84 FEC51bce8421a7b Roberson, Pennington and Palmer http://www.keith-fisher.com/ Cameroon Adaptive bi-directional hierarchy 1993 Telecommunications 5571
86 85 e0E8e27eAc9CAd5 George, Russo and Guerra https://drake.com/ Sweden Centralized non-volatile capability 1989 Military Industry 2880
87 86 B97a6CF9bf5983C Davila Inc https://mcconnell.info/ Cocos (Keeling) Islands Profit-focused dedicated frame 2017 Consumer Electronics 2215
88 87 a0a6f9b3DbcBEb5 Mays-Preston http://www.browning-key.com/ Mali User-centric heuristic focus group 2006 Military Industry 5786
89 88 8cC1bDa330a5871 Pineda-Morton https://www.carr.com/ United States Virgin Islands Grass-roots methodical info-mediaries 1991 Printing 6168
90 89 ED889CB2FE9cbd3 Huang and Sons https://www.bolton.com/ Eritrea Re-contextualized dynamic hierarchy 1981 Semiconductors 7484
91 90 F4Dc1417BC6cb8f Gilbert-Simon https://www.bradford.biz/ Burundi Grass-roots radical parallelism 1973 Newspapers / Journalism 1927
92 91 7ABc3c7ecA03B34 Sampson-Griffith http://hendricks.org/ Benin Multi-layered composite paradigm 1972 Textiles 3881
93 92 4e0719FBE38e0aB Miles-Dominguez http://www.turner.com/ Gibraltar Organized empowering forecast 1996 Civic / Social Organization 897
94 93 dEbDAAeDfaed00A Rowe and Sons https://www.simpson.org/ El Salvador Balanced multimedia knowledgebase 1978 Facilities Services 8172
95 94 61BDeCfeFD0cEF5 Valenzuela, Holmes and Rowland https://www.dorsey.net/ Taiwan Persistent tertiary focus group 1999 Transportation 1483
96 95 4e91eD25f486110 Best, Wade and Shepard https://zimmerman.com/ Zimbabwe Innovative background definition 1991 Gambling / Casinos 4873
97 96 0a0bfFbBbB8eC7c Holmes Group https://mcdowell.org/ Ethiopia Right-sized zero tolerance focus group 1975 Photography 2988
98 97 BA6Cd9Dae2Efd62 Good Ltd http://duffy.com/ Anguilla Reverse-engineered composite moratorium 1971 Consumer Services 4292
99 98 E7df80C60Abd7f9 Clements-Espinoza http://www.flowers.net/ Falkland Islands (Malvinas) Progressive modular hub 1991 Broadcast Media 236
100 99 AFc285dbE2fEd24 Mendez Inc https://www.burke.net/ Kyrgyz Republic User-friendly exuding migration 1993 Education Management 339
101 100 e9eB5A60Cef8354 Watkins-Kaiser http://www.herring.com/ Togo Synergistic background access 2009 Financial Services 2785

Binary file not shown.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,78 @@
# README Format File for Testing
![Alt text](./examples/diagram.png?raw=true "Fig.1. design of the document retrieval conversation system")
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Install](#install)
- [How to Use](#how-to-use)
- Examples
- [Local Chinese Retrieval QA + Chat](examples/retrieval_conversation_zh.py)
- [Local English Retrieval QA + Chat](examples/retrieval_conversation_en.py)
- [Local Bi-lingual Retrieval QA + Chat](examples/retrieval_conversation_universal.py)
- [Experimental AI Agent Based on Chatgpt + Chat](examples/conversation_agent_chatgpt.py)
**As Colossal-AI is undergoing some major updates, this project will be actively maintained to stay in line with the Colossal-AI project.**
## Install
Install colossalqa
```bash
# python==3.8.17
cd ColossalAI/applications/ColossalQA
pip install -e .
```
To use the vllm server, please refer to the official guide [here](https://github.com/vllm-project/vllm/tree/main) for installation instruction. Simply run the following command from another terminal.
```bash
cd ./vllm/entrypoints
python api_server.py --host localhost --port $PORT_NUMBER --model $PATH_TO_MODEL --swap-space $SWAP_SPACE_IN_GB
```
## How to use
### Collect your data
For ChatGPT based Agent we support document retrieval and simple sql search.
If you want to run the demo locally, we provided document retrieval based conversation system built upon langchain. It accept a wide range of documents.
Read comments under ./colossalqa/data_loader for more detail
### Serving
Currently use vllm will replace with colossal inference when ready. Please refer class VllmLLM.
### Run the script
We provided scripts for Chinese document retrieval based conversation system, English document retrieval based conversation system, Bi-lingual document retrieval based conversation system and an experimental AI agent with document retrieval and SQL query functionality.
To run the bi-lingual scripts, set the following environmental variables before running the script.
```bash
export ZH_MODEL_PATH=XXX
export ZH_MODEL_NAME: chatglm2
export EN_MODEL_PATH: XXX
export EN_MODEL_NAME: llama
python retrieval_conversation_universal.py
```
To run retrieval_conversation_en.py. set the following environmental variables.
```bash
export EN_MODEL_PATH=XXX
export EN_MODEL_NAME: llama
python retrieval_conversation_en.py
```
To run retrieval_conversation_zh.py. set the following environmental variables.
```bash
export ZH_MODEL_PATH=XXX
export ZH_MODEL_NAME: chatglm2
python retrieval_conversation_en.py
```
It will ask you to provide the path to your data during the execution of the script. You can also pass a glob path to load multiple files at once. If csv files are provided, please use ',' as delimiter and '"' as quotation mark. There are no other formatting constraints for loading documents type files. For loading table type files, we use pandas, please refer to [Pandas-Input/Output](https://pandas.pydata.org/pandas-docs/stable/reference/io.html) for file format details.
## The Plan
- [x] build document retrieval QA tool
- [x] Add long + short term memory
- [x] Add demo for AI agent with SQL query
- [x] Add customer retriever for fast construction and retrieving (with incremental mode)

View File

@@ -0,0 +1,38 @@
Your Name
Lorem ipsum dolor sit amet, consectetuer adipiscing elit
123 Your Street
Your City, ST 12345
(123) 456-7890
no_reply@example.com
EXPERIENCE
Company, Location — Job Title
MONTH 20XX - PRESENT
Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh.
Company, Location — Job Title
MONTH 20XX - MONTH 20XX
Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh.
Company, Location — Job Title
MONTH 20XX - MONTH 20XX
Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh.
EDUCATION
School Name, Location — Degree
MONTH 20XX - MONTH 20XX
Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore.
School Name, Location — Degree
MONTH 20XX - MONTH 20XX
Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam.
PROJECTS
Project Name — Detail
Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
SKILLS
* Lorem ipsum dolor sit amet.
* Consectetuer adipiscing elit.
* Sed diam nonummy nibh euismod tincidunt.
* Laoreet dolore magna aliquam erat volutpat.
AWARDS
Lorem ipsum dolor sit amet Consectetuer adipiscing elit, Sed diam nonummy
Nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.
Lorem ipsum dolor sit amet Consectetuer adipiscing elit, Sed diam nonummy
Nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.
LANGUAGES
Lorem ipsum, Dolor sit amet, Consectetuer