The learning objectives of this are:
Abacus has multiple feature group operations that can be accessed using the “Actions” button.
Instead of having to write API code or SQL code to execute a task, you can use one of the predefined tasks that exist in the actions button.
Example tasks are:
The feature group operations and tasks will create a new feature group that will be referencing the original feature group.
As an example, let’s see how we could create a web crawler using a feature group operation.
The steps are:
Click on “Actions” —> “Feature Group Operation”
Choose the “Crawler” feature group operation.
Here is how your dataset should look like:
There are two columns here:
- URL: This is the URL that the crawler will extract data from
- Depth: Whether we will extract data only from the given URL or from URL’s that are within that URL as well.
There are also multiple advanced options that might help crawl websites that require a user agent. Please note that we will honor the settings within the `robots.txt` file and won’t crawl websites that don’t allow for web crawling.