mirror of
https://github.com/hwchase17/langchain.git
synced 2026-02-21 14:43:07 +00:00
- Updates chat few shot prompt tutorial to show off a more cohesive example - Fix async Chromium loader guide - Fix Excel loader install instructions - Reformat Html2Text page - Add install instructions to Azure OpenAI embeddings page - Add missing dep install to SQL QA tutorial @baskaryan
176 lines
5.1 KiB
Plaintext
176 lines
5.1 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "fe6e5c82",
|
|
"metadata": {},
|
|
"source": [
|
|
"# HTML to text\n",
|
|
"\n",
|
|
">[html2text](https://github.com/Alir3z4/html2text/) is a Python package that converts a page of `HTML` into clean, easy-to-read plain `ASCII text`. \n",
|
|
"\n",
|
|
"The ASCII also happens to be a valid `Markdown` (a text-to-HTML format)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "ce77e0cb",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%pip install --upgrade --quiet html2text"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "8ca0974b",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stderr",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"USER_AGENT environment variable not set, consider setting it to identify your requests.\n",
|
|
"Fetching pages: 100%|##########| 2/2 [00:00<00:00, 14.74it/s]\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"from langchain_community.document_loaders import AsyncHtmlLoader\n",
|
|
"\n",
|
|
"urls = [\"https://www.espn.com\", \"https://lilianweng.github.io/posts/2023-06-23-agent/\"]\n",
|
|
"loader = AsyncHtmlLoader(urls)\n",
|
|
"docs = loader.load()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "a95a928c",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"## Fantasy\n",
|
|
"\n",
|
|
" * Football\n",
|
|
"\n",
|
|
" * Baseball\n",
|
|
"\n",
|
|
" * Basketball\n",
|
|
"\n",
|
|
" * Hockey\n",
|
|
"\n",
|
|
"## ESPN Sites\n",
|
|
"\n",
|
|
" * ESPN Deportes\n",
|
|
"\n",
|
|
" * Andscape\n",
|
|
"\n",
|
|
" * espnW\n",
|
|
"\n",
|
|
" * ESPNFC\n",
|
|
"\n",
|
|
" * X Games\n",
|
|
"\n",
|
|
" * SEC Network\n",
|
|
"\n",
|
|
"## ESPN Apps\n",
|
|
"\n",
|
|
" * ESPN\n",
|
|
"\n",
|
|
" * ESPN Fantasy\n",
|
|
"\n",
|
|
" * Tournament Challenge\n",
|
|
"\n",
|
|
"## Follow ESPN\n",
|
|
"\n",
|
|
" * Facebook\n",
|
|
"\n",
|
|
" * X/Twitter\n",
|
|
"\n",
|
|
" * Instagram\n",
|
|
"\n",
|
|
" * Snapchat\n",
|
|
"\n",
|
|
" * TikTok\n",
|
|
"\n",
|
|
" * YouTube\n",
|
|
"\n",
|
|
"## Fresh updates to our NBA mock draft: Everything we're hearing hours before\n",
|
|
"Round 1\n",
|
|
"\n",
|
|
"With hours until Round 1 begins (8 p.m. ET on ESPN and ABC), ESPN draft\n",
|
|
"insiders Jonathan Givony and Jeremy Woo have new intel on lottery picks and\n",
|
|
"more.\n",
|
|
"\n",
|
|
"2hJonathan Givony and Jeremy Woo\n",
|
|
"\n",
|
|
"Illustration by ESPN\n",
|
|
"\n",
|
|
"## From No. 1 to 100: Ranking the 2024 NBA draft prospects\n",
|
|
"\n",
|
|
"Who's No. 1? Where do the Kentucky, Duke and UConn players rank? Here's our\n",
|
|
"final Top 100 Big Board.\n",
|
|
"\n",
|
|
"6hJonathan Givony and Jeremy Woo\n",
|
|
"\n",
|
|
" * Full draft order: All 58 picks over two rounds\n",
|
|
" * Trade tracker: Details for all deals\n",
|
|
"\n",
|
|
" * Betting buzz: Lakers favorites to draft Bronny\n",
|
|
" * Use our NBA draft simu\n",
|
|
"ent system, LLM functions as the agent's brain,\n",
|
|
"complemented by several key components:\n",
|
|
"\n",
|
|
" * **Planning**\n",
|
|
" * Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\n",
|
|
" * Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\n",
|
|
" * **Memory**\n",
|
|
" * Short-term memory: I would consider all the in-context learning (See Prompt Engineering) as utilizing short-term memory of the model to learn.\n",
|
|
" * Long-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.\n",
|
|
" * **Tool use**\n",
|
|
" * The agent learns to call external APIs for extra information that is missing from the model weights (often hard to change after pre-training), including \n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"from langchain_community.document_transformers import Html2TextTransformer\n",
|
|
"\n",
|
|
"urls = [\"https://www.espn.com\", \"https://lilianweng.github.io/posts/2023-06-23-agent/\"]\n",
|
|
"html2text = Html2TextTransformer()\n",
|
|
"docs_transformed = html2text.transform_documents(docs)\n",
|
|
"\n",
|
|
"print(docs_transformed[0].page_content[1000:2000])\n",
|
|
"\n",
|
|
"print(docs_transformed[1].page_content[1000:2000])"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.5"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|