mirror of
https://github.com/hwchase17/langchain.git
synced 2025-06-28 17:38:36 +00:00
docs: Add question answering over a website to web scraping (#10637)
**Description:** I've added a new use-case to the Web scraping docs. I also fixed some typos in the existing text. --------- Co-authored-by: davidjohnbarton <41335923+davidjohnbarton@users.noreply.github.com>
This commit is contained in:
parent
976a18c1d5
commit
75c04f0833
@ -453,11 +453,11 @@
|
||||
"\n",
|
||||
"Related to scraping, we may want to answer specific questions using searched content.\n",
|
||||
"\n",
|
||||
"We can automate the process of [web research](https://blog.langchain.dev/automating-web-research/) using a retriver, such as the `WebResearchRetriever` ([docs](https://python.langchain.com/docs/modules/data_connection/retrievers/web_research)).\n",
|
||||
"We can automate the process of [web research](https://blog.langchain.dev/automating-web-research/) using a retriever, such as the `WebResearchRetriever` ([docs](https://python.langchain.com/docs/modules/data_connection/retrievers/web_research)).\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Copy requirments [from here](https://github.com/langchain-ai/web-explorer/blob/main/requirements.txt):\n",
|
||||
"Copy requirements [from here](https://github.com/langchain-ai/web-explorer/blob/main/requirements.txt):\n",
|
||||
"\n",
|
||||
"`pip install -r requirements.txt`\n",
|
||||
" \n",
|
||||
@ -573,13 +573,70 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ff62e5f5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Going deeper \n",
|
||||
"\n",
|
||||
"* Here's a [app](https://github.com/langchain-ai/web-explorer/tree/main) that wraps this retriver with a lighweight UI."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "312c399e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Question answering over a website\n",
|
||||
"\n",
|
||||
"To answer questions over a specific website, you can use Apify's [Website Content Crawler](https://apify.com/apify/website-content-crawler) Actor, which can deeply crawl websites such as documentation, knowledge bases, help centers, or blogs,\n",
|
||||
"and extract text content from the web pages.\n",
|
||||
"\n",
|
||||
"In the example below, we will deeply crawl the Python documentation of LangChain's Chat LLM models and answer a question over it.\n",
|
||||
"\n",
|
||||
"First, install the requirements\n",
|
||||
"`pip install apify-client openai langchain chromadb tiktoken`\n",
|
||||
" \n",
|
||||
"Next, set `OPENAI_API_KEY` and `APIFY_API_TOKEN` in your environment variables.\n",
|
||||
"\n",
|
||||
"The full code follows:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "9b08da5e",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" Yes, LangChain offers integration with OpenAI chat models. You can use the ChatOpenAI class to interact with OpenAI models.\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from langchain.docstore.document import Document\n",
|
||||
"from langchain.indexes import VectorstoreIndexCreator\n",
|
||||
"from langchain.utilities import ApifyWrapper\n",
|
||||
"\n",
|
||||
"apify = ApifyWrapper()\n",
|
||||
"# Call the Actor to obtain text from the crawled webpages\n",
|
||||
"loader = apify.call_actor(\n",
|
||||
" actor_id=\"apify/website-content-crawler\",\n",
|
||||
" run_input={\"startUrls\": [{\"url\": \"https://python.langchain.com/docs/integrations/chat/\"}]},\n",
|
||||
" dataset_mapping_function=lambda item: Document(\n",
|
||||
" page_content=item[\"text\"] or \"\", metadata={\"source\": item[\"url\"]}\n",
|
||||
" ),\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Create a vector store based on the crawled data\n",
|
||||
"index = VectorstoreIndexCreator().from_loaders([loader])\n",
|
||||
"\n",
|
||||
"# Query the vector store\n",
|
||||
"query = \"Are any OpenAI chat models integrated in LangChain?\"\n",
|
||||
"result = index.query(query)\n",
|
||||
"print(result)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@ -598,7 +655,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.9.16"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
Loading…
Reference in New Issue
Block a user