mirror of
https://github.com/hwchase17/langchain.git
synced 2025-12-24 00:17:05 +00:00
Vwp/docs improved document loaders (#4006)
Huge thanks to @leo-gan for improving the document loaders notebooks --------- Co-authored-by: Leonid Ganeline <leo.gan.57@gmail.com>
This commit is contained in:
@@ -6,14 +6,19 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Hacker News\n",
|
||||
"How to pull page data and comments from Hacker News"
|
||||
"\n",
|
||||
">[Hacker News](https://en.wikipedia.org/wiki/Hacker_News) (sometimes abbreviated as HN) is a social news website focusing on computer science and entrepreneurship. It is run by the investment fund and startup incubator Y Combinator. In general, content that can be submitted is defined as \"anything that gratifies one's intellectual curiosity.\"\n",
|
||||
"\n",
|
||||
"This notebook covers how to pull page data and comments from [Hacker News](https://news.ycombinator.com/)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "ff49b177",
|
||||
"metadata": {},
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from langchain.document_loaders import HNLoader"
|
||||
@@ -23,7 +28,9 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "849a8d52",
|
||||
"metadata": {},
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"loader = HNLoader(\"https://news.ycombinator.com/item?id=34817881\")"
|
||||
@@ -33,7 +40,9 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "c2826836",
|
||||
"metadata": {},
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = loader.load()"
|
||||
@@ -43,15 +52,14 @@
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "fefa2adc",
|
||||
"metadata": {},
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[Document(page_content=\"delta_p_delta_x 18 hours ago \\n | next [–] \\n\\nAstrophysical and cosmological simulations are often insightful. They're also very cross-disciplinary; besides the obvious astrophysics, there's networking and sysadmin, parallel computing and algorithm theory (so that the simulation programs are actually fast but still accurate), systems design, and even a bit of graphic design for the visualisations.Some of my favourite simulation projects:- IllustrisTNG: https://www.tng-project.org/- SWIFT: https://swift.dur.ac.uk/- CO5BOLD: https://www.astro.uu.se/~bf/co5bold_main.html (which produced these animations of a red-giant star: https://www.astro.uu.se/~bf/movie/AGBmovie.html)- AbacusSummit: https://abacussummit.readthedocs.io/en/latest/And I can add the simulations in the article, too.\\n \\nreply\", lookup_str='', metadata={'source': 'https://news.ycombinator.com/item?id=34817881', 'title': 'What Lights the Universe’s Standard Candles?'}, lookup_index=0),\n",
|
||||
" Document(page_content=\"andrewflnr 19 hours ago \\n | prev | next [–] \\n\\nWhoa. I didn't know the accretion theory of Ia supernovae was dead, much less that it had been since 2011.\\n \\nreply\", lookup_str='', metadata={'source': 'https://news.ycombinator.com/item?id=34817881', 'title': 'What Lights the Universe’s Standard Candles?'}, lookup_index=0),\n",
|
||||
" Document(page_content='andreareina 18 hours ago \\n | prev | next [–] \\n\\nThis seems to be the paper https://academic.oup.com/mnras/article/517/4/5260/6779709\\n \\nreply', lookup_str='', metadata={'source': 'https://news.ycombinator.com/item?id=34817881', 'title': 'What Lights the Universe’s Standard Candles?'}, lookup_index=0),\n",
|
||||
" Document(page_content=\"andreareina 18 hours ago \\n | prev [–] \\n\\nWouldn't double detonation show up as variance in the brightness?\\n \\nreply\", lookup_str='', metadata={'source': 'https://news.ycombinator.com/item?id=34817881', 'title': 'What Lights the Universe’s Standard Candles?'}, lookup_index=0)]"
|
||||
"\"delta_p_delta_x 73 days ago \\n | next [–] \\n\\nAstrophysical and cosmological simulations are often insightful. They're also very cross-disciplinary; besides the obvious astrophysics, there's networking and sysadmin, parallel computing and algorithm theory (so that the simulation programs a\""
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
@@ -60,16 +68,32 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"data"
|
||||
"data[0].page_content[:300]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"execution_count": 5,
|
||||
"id": "938ff4ee",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'source': 'https://news.ycombinator.com/item?id=34817881',\n",
|
||||
" 'title': 'What Lights the Universe’s Standard Candles?'}"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"data[0].metadata"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
@@ -88,7 +112,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
"version": "3.10.6"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
|
||||
Reference in New Issue
Block a user