Headless Web Scrapper Bot in Python and Requests and BS4 on Amazon EC2 Free Tier With Ubuntu Server 16.04 Tutorial

ADSENSE

Spread the love

Headless Web Scrapper Bot in Python and Requests and BS4 on Amazon EC2 Free Tier With Ubuntu Server 16.04 Tutorial

originally posted on MI Python

Introduction

Thought I would share my method for a Free tier Amazon EC2 basic Web scraping Bot written in the Python scripting language in this tutorial . This bot requires no GUI so is very resource friendly for small free tier Amazon EC2 instances. It will run completely in the Terminal. I’m choosing Ubuntu Server 16.04 for the Operating System. Ubuntu with no GUI is fairly light weight. For this application the less bloat the better. The Python libraries / modules that we will utilize today are, Requests and Beautiful Soup 4 in this tutorial. We will scrape a Wikipedia page and then save the parsed data to a local file on the Amazon EC2 Instance running Ubuntu Server 16.04.

 

What you will learn from this lesson / tutorial

In todays lesson / tutorial you will learn.

1.) How to set up and configure your Amazon EC2 Instance Running Ubuntu Server 16.04

2.) Installing PIP3 and Python3 through command line terminal

3.) Installing Beautiful Soup 4 and Requests Python modules / libraries through PIP

4.) Writing your very own headless web scraping bot in Python that saves a parsed web page source to a file.

Now on to the tutorial below.

Headless Web Scrapper Bot in Python and Requests on Amazon EC2 With Ubuntu Server 16.04 Tutorial

ADSENSE