mirror of
https://github.com/hwchase17/langchain.git
synced 2025-09-03 20:16:52 +00:00
Added new use case docs for Web Scraping, Chromium loader, BS4 transformer (#8732)
- Description: Added a new use case category called "Web Scraping", and a tutorial to scrape websites using OpenAI Functions Extraction chain to the docs. - Tag maintainer:@baskaryan @hwchase17 , - Twitter handle: https://www.linkedin.com/in/haiphunghiem/ (I'm on LinkedIn mostly) --------- Co-authored-by: Lance Martin <lance@langchain.dev>
This commit is contained in:
9
docs/docs_skeleton/docs/use_cases/web_scraping/index.mdx
Normal file
9
docs/docs_skeleton/docs/use_cases/web_scraping/index.mdx
Normal file
@@ -0,0 +1,9 @@
|
||||
---
|
||||
sidebar_position: 3
|
||||
---
|
||||
|
||||
# Web Scraping
|
||||
|
||||
Web scraping has historically been a challenging endeavor due to the ever-changing nature of website structures, making it tedious for developers to maintain their scraping scripts. Traditional methods often rely on specific HTML tags and patterns which, when altered, can disrupt data extraction processes.
|
||||
|
||||
Enter the LLM-based method for parsing HTML: By leveraging the capabilities of LLMs, and especially OpenAI Functions in LangChain's extraction chain, developers can instruct the model to extract only the desired data in a specified format. This method not only streamlines the extraction process but also significantly reduces the time spent on manual debugging and script modifications. Its adaptability means that even if websites undergo significant design changes, the extraction remains consistent and robust. This level of resilience translates to reduced maintenance efforts, cost savings, and ensures a higher quality of extracted data. Compared to its predecessors, LLM-based approach wins out the web scraping domain by transforming a historically cumbersome task into a more automated and efficient process.
|
BIN
docs/docs_skeleton/static/img/web_research.png
Normal file
BIN
docs/docs_skeleton/static/img/web_research.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 152 KiB |
BIN
docs/docs_skeleton/static/img/web_scraping.png
Normal file
BIN
docs/docs_skeleton/static/img/web_scraping.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 172 KiB |
BIN
docs/docs_skeleton/static/img/wsj_page.png
Normal file
BIN
docs/docs_skeleton/static/img/wsj_page.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 716 KiB |
Reference in New Issue
Block a user