Skip to content

feat: add lesson about using the platform #1424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

honzajavorek
Copy link
Collaborator

@honzajavorek honzajavorek commented Jan 22, 2025

Introducing the final lesson of the course about deploying to the platform. This was quite challenging as with every other sentence I grappled with bugs or behavior, which wasn't really intuitive to me. On my journey I filed these:

I explored several approaches, which were dead ends. The lesson now takes an approach where it starts a new project from a template and replaces parts of the template with the original scraper. That completely avoids apify init and should be robust with regard to possible future changes, such as migrating to uv, and so on.

I find the UI of the Apify console rather confusing and super complex, especially navigation, even as a user who regularly visits the interface for the past year. Also the UI seems to remember my last location or something like that, so every time I open it, it defaults to a different tab. Once it's Input, other time it's Last run, etc.

I'm no UX designer, so I can't help with that, just sharing it here as a feedback and a fact, which I took into account when creating the lesson. The only way to mitigate the confusion which came to my mind was to provide as many screenshots as possible. Also I didn't dare to rely on where the student might land, and I make sure to re-iterate on which screen and in which tab they should be.

The lesson intentionally goes through updating the Actor so that the student knows how to do it and how to push new changes and build and run the Actor again and again. I opted to keep the student using the Input tab as the place from which they start the Actor, even though in reality they could press the Start button from other tabs too. I feel like that way it's less confusing, makes most sense, and they won't get distracted by all the other options that much.

I did my best to structure the lesson so that it leads from stating shortcomings of the current solution to understanding how the platform helps to solve them, because I think that's the most honest way to "sell" the platform.

Let me know what you think!

@honzajavorek honzajavorek force-pushed the honzajavorek/platform branch from a308047 to d44c772 Compare March 14, 2025 14:15
@honzajavorek honzajavorek marked this pull request as ready for review March 14, 2025 14:35
Copy link
Member

@metalwarrior665 metalwarrior665 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Was there a discussion about where this content should live before? There is quite a lot of duplication both with https://docs.apify.com/platform and https://docs.apify.com/academy/apify-platform. The approach is JS was to have the scraping tutorial separate from the platform.

I'm not against having the whole thing follow in the Python course (as it can specialize to Python devs needs) but then we will have to maintain duplicate content which tends to be a bit annoying.

@honzajavorek
Copy link
Collaborator Author

@metalwarrior665 The discussion has happened here: #1015 (comment) I don't want a duplicate content, but this is a logical ending of the course:

  1. basics in DevTools
  2. basics in Python
  3. use framework to simplify your code and get some other benefits (Crawlee)
  4. deploy to a platform to get some other benefits (Apify)

The lesson is specific to the scraper we're building over the course of the lessons. You could say the same about the previous lesson about Crawlee, where the same content could be covered by Crawlee docs.

Copy link
Contributor

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we use webp instead of png? The images would be about 5x smaller. Changes to the Python code are okay.

@honzajavorek
Copy link
Collaborator Author

I don't mind using webp, but it's just about the size of this repo. The site has the images optimized automatically, at least that's what I remember @B4nan saying somewhere else in the comments.

@B4nan
Copy link
Member

B4nan commented Mar 18, 2025

The size of the repo is also important, if the difference is 5x let's just go with webp. We use them pretty much exclusively in the crawlee blog posts too for the same reason.

@honzajavorek
Copy link
Collaborator Author

honzajavorek commented Mar 18, 2025 via email

@honzajavorek
Copy link
Collaborator Author

I'm moving the discussion about images to a separate issue: #1549 Regarding this particular PR, the images here are technically already a part of this git repo now, so converting them would only add size, but if you want me to change them to webp, I'll do it.

@TC-MO
Copy link
Contributor

TC-MO commented Apr 28, 2025

I think that the structure that @honzajavorek proposed is the correct way of creating this course and integrating it with Academy. It aligns especially well with current work on trimming down Apify Platform content from Academy

Copy link
Contributor

@TC-MO TC-MO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions & changes suggested otherwise LGTM

Comment on lines +15 to +18
- **User-operated:** We have to run the scraper ourselves. If we're tracking price trends, we'd need to remember to run it daily. And if we want alerts for big discounts, manually running the program isn't much better than just checking the site in a browser every day.
- **No monitoring:** If we have a spare server or a Raspberry Pi lying around, we could use [cron](https://en.wikipedia.org/wiki/Cron) to schedule it. But even then, we'd have little insight into whether it ran successfully, what errors or warnings occurred, how long it took, or what resources it used.
- **Manual data management:** Tracking prices over time means figuring out how to organize the exported data ourselves. Processing the data could also be tricky since different analysis tools often require different formats.
- **Anti-scraping risks:** If the target website detects our scraper, they can rate-limit or block us. Sure, we could run it from a coffee shop's Wi-Fi, but eventually, they'd block that too—risking seriously annoying the barista.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change here from bold to emphasis? Usually we use bold across docs to highlight UI elements so I would prefer if we could keep it the same way here


Scraping platforms come in many varieties, offering a wide range of tools and approaches. As the course authors, we're obviously a bit biased toward Apify—we think it's both powerful and complete.

That said, the main goal of this lesson is to show how deploying to **any platform** can make life easier. Plus, everything we cover here fits within [Apify's free tier](https://apify.com/pricing).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here could we change it to emphasis?


## Registering

First, let's [create a new Apify account](https://console.apify.com/sign-up). You'll go through a few checks to confirm you're human and your email is valid—annoying but necessary to prevent abuse of the platform.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be opportunity to send to our docs? (though it should be trivial to sign-up so I'll just leave it up to your consideration) ¯\_(ツ)_/¯

```text
$ apify login
...
Success: You are logged in to Apify as user1234!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually used `<YOUR_XYZ> as placeholders, not sure what is your experience with that. Is the mock username in that format better received by users?

Comment on lines +63 to +73
Change to a directory where you start new projects in your terminal. Then, run the following command—it will create a new subdirectory called `warehouse-watchdog` for the new project, containing all the necessary files:

```text
$ apify create warehouse-watchdog --template=python-crawlee-beautifulsoup
Info: Python version 0.0.0 detected.
Info: Creating a virtual environment in ...
...
Success: Actor 'warehouse-watchdog' was created. To run it, run "cd warehouse-watchdog" and "apify run".
Info: To run your code in the cloud, run "apify push" and deploy your code to Apify Console.
Info: To install additional Python packages, you need to activate the virtual environment in the ".venv" folder in the actor directory.
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be broken up??

Suggested change
Change to a directory where you start new projects in your terminal. Then, run the following command—it will create a new subdirectory called `warehouse-watchdog` for the new project, containing all the necessary files:
```text
$ apify create warehouse-watchdog --template=python-crawlee-beautifulsoup
Info: Python version 0.0.0 detected.
Info: Creating a virtual environment in ...
...
Success: Actor 'warehouse-watchdog' was created. To run it, run "cd warehouse-watchdog" and "apify run".
Info: To run your code in the cloud, run "apify push" and deploy your code to Apify Console.
Info: To install additional Python packages, you need to activate the virtual environment in the ".venv" folder in the actor directory.
```
Change to a directory where you start new projects in your terminal. Then, run the following command:
`$ apify create warehouse-watchdog --template=python-crawlee-beautifulsoup`
it will create a new subdirectory called `warehouse-watchdog` for the new project, containing all the necessary files:
```text
Info: Python version 0.0.0 detected.
Info: Creating a virtual environment in ...
...
Success: Actor 'warehouse-watchdog' was created. To run it, run "cd warehouse-watchdog" and "apify run".
Info: To run your code in the cloud, run "apify push" and deploy your code to Apify Console.
Info: To install additional Python packages, you need to activate the virtual environment in the ".venv" folder in the actor directory.

@B4nan
Copy link
Member

B4nan commented Apr 28, 2025

the images here are technically already a part of this git repo now

They are part of your branch only, which will be wiped after we merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-academy Issues related to Web Scraping and Apify academies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants