You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sources/academy/webscraping/scraping_basics_python/12_framework.md
+8-6
Original file line number
Diff line number
Diff line change
@@ -175,7 +175,7 @@ In the final statistics, you can see that we made 25 requests (1 listing page +
175
175
176
176
## Extracting data
177
177
178
-
The BeautifulSoup crawler provides handlers with the `context.soup` attribute, where we can find the parsed HTML of the handled page. This is the same as the `soup` we had in our previous program. Let's locate and extract the same data as before:
178
+
The BeautifulSoup crawler provides handlers with the `context.soup` attribute, which contains the parsed HTML of the handled page. This is the same `soup`object we used in our previous program. Let's locate and extract the same data as before:
Now the price. We won't be inventing anything new here-let's add`Decimal`import and copy-paste code from our old scraper.
191
+
Now for the price. We're not doing anything new here—just import`Decimal` and copy-paste the code from our old scraper.
192
192
193
-
The only change will be in the selector. In `main.py`, we were looking for `.price`inside a `product_soup` representing a product card. Now we're looking for `.price`inside the whole product detail page. It's safer to be more specific so that we won't match another price on the same page:
193
+
The only change will be in the selector. In `main.py`, we looked for `.price`within a `product_soup`object representing a product card. Now, we're looking for `.price`within the entire product detail page. It's better to be more specific so we don't accidentally match another price on the same page:
Finally, variants. We can reuse the `parse_variant()` function as it is, and even the handler code will look similar to what we already had. The whole program will look like this:
216
+
Finally, the variants. We can reuse the `parse_variant()` function as-is, and in the handler we'll again take inspiration from what we had in `main.py`. The full program will look like this:
217
217
218
218
```py
219
219
import asyncio
@@ -266,12 +266,14 @@ if __name__ == '__main__':
266
266
asyncio.run(main())
267
267
```
268
268
269
-
If you run this scraper, you should see the same data about the 24 products as before. Crawlee has saved us a lot of work with downloading, parsing, logging, and parallelization. The code is also easier to follow with the two handlers separated and labeled.
269
+
If you run this scraper, you should get the same data for the 24 products as before. Crawlee has saved us a lot of effort by managing downloading, parsing, logging, and parallelization. The code is also cleaner, with two separate and labeled handlers.
270
270
271
-
Crawlee doesn't help much with locating and extracting the data-that code is almost identical with or without framework. That's because the detective work involved, and taking care of the extraction, are the main added value of custom-made scrapers. With Crawlee, you can focus on just that, and let the framework take care of the rest.
271
+
Crawlee doesn't do much to help with locating and extracting the data—that part of the code remains almost the same, framework or not. This is because the detective work of finding and extracting the right data is the core value of custom scrapers. With Crawlee, you can focus on just that while letting the framework take care of everything else.
272
272
273
273
## Saving data
274
274
275
+
When we're at _letting the framework take care of everything else_, let's take a look at what it can do about saving data.
276
+
275
277
:::danger Work in progress
276
278
277
279
This course is incomplete. As we work on adding new lessons, we would love to hear your feedback. You can comment right here under each page or [file a GitHub Issue](https://github.com/apify/apify-docs/issues) to discuss a problem.
0 commit comments