The Problem: Finding New Jobs Efficiently
As a student, keeping track of job opportunities is important, especially when I’m looking for internships or part-time work. I wanted to automate this process so I don’t have to keep opening the website every time to check if I missed a job posting.
By creating a job scraping service that checks for new student IT job listings and emails me whenever new opportunities become available, this becomes easier.
I wanted a system that could:
- Automatically scrape job listings from a website (in this case, studentski-poslovi.hr, an IT job portal in Croatia).
- Parse and filter these listings for new job opportunities.
- Send me an email notification if there were any new jobs that hadn’t been seen before.
And the best part? I wanted all of this to run automatically every day, without me having to worry about it.
This blog post will walk you through how I built the entire system using Node.js, GitHub Actions, and Gmail for email notifications.
Step 1: Scraping Job Listings with Node.js
The first step was to set up a Node.js project to scrape job listings from the website. I used a few packages that helped make the process smoother and created a main.js script:
- Sends an HTTP request to the website to fetch job data with Axios.
- Parses the data with cheerio, extracting details like job title, company, location, salary, and the job link with Cheerio.
- Compares the current jobs to the previously saved jobs to find new listings.
- Sends an email with the new job listings using Nodemailer.
Example of the core logic, full code can be found on my Github Repository ↪:
Step 2: Handling New Jobs
The main challenge was finding new jobs that hadn’t been seen before. Here’s how I handled it:
- Every time the script runs, it scrapes the page and stores the current list of jobs.
- I compare this new list of jobs with the previously saved list (previous_jobs.json).
- If there are new jobs that aren’t in the previous list, I send an email with those job listings.
This ensures that I only get notified about new opportunities that I haven’t already seen.
Step 3: Sending Email Notifications
Once the script identifies new job listings, it sends me an email with the details. I used Nodemailer to create and send the email, making sure that the email includes the following:
- Job title
- Company name
- Salary (if available)
- Location
- A link to view the job
Here’s the basic structure of the email:
This way, I receive an email every time there are new jobs that match the criteria.
Also make sure to add .env files into Github Actions for sending notifications:
Step 4: Automating the Process with GitHub Actions
The most important part was automating the entire process so that I didn’t have to run the script manually every day. This is where GitHub Actions came in handy.
I set up a cron job in GitHub Actions that triggers every 6 hours. This cron syntax ensures that the job scraping script runs at 00:00, 06:00, 12:00, and 18:00 UTC every day, fetching the latest job listings.
The GitHub Actions workflow file (scrape-jobs.yml) looks like this:
Step 5: Storing Job Data
Since GitHub Actions runs in a clean environment each time, the previous_jobs.json file is not automatically preserved between runs.
To solve this, I added a step in my GitHub Actions workflow that commits and pushes the previous_jobs.json file back to the repository. This way, the file is updated with each run and remains available in the repository.
Handling Errors and Ensuring Smooth Operation
I made sure that even if no new jobs are found, the workflow continues without failing. I used the continue-on-error: true flag in the script run step to ensure the action doesn’t stop when no new jobs are found.
Another problem was setting permissions for the GitHub Actions Bot so it can push and edit the files automatically. This was solved by setting the permissions in the settings to Read & Write, and adding GITHUB_TOKEN to the .env.
In addition, I also added a check before committing and pushing changes. The script only commits if there are actual changes (i.e., new jobs), avoiding unnecessary commits.