Do It Yourself – Tutorials – Intro to Web Scraping with Python and Beautiful Soup

by | Apr 3, 2020 | 0 comments

Do It Yourself – Tutorials – Intro to Web Scraping with Python and Beautiful Soup

by | Apr 3, 2020 | Do It Yourself - Build Your Own Website | 0 comments

Do It Yourself – Website Tutorials

Web scraping is a very powerful tool to learn for any data professional. With web scraping the entire internet becomes your database. In this tutorial we show you how to parse a web page into a data file (csv) using a Python package called BeautifulSoup.

In this example, we web scrape graphics cards from NewEgg.com.

Python Code:
https://code.datasciencedojo.com/datasciencedojo/tutorials/tree/master/Web%20Scraping%20with%20Python%20and%20BeautifulSoup

Sublime:
https://www.sublimetext.com/3

Anaconda:

Anaconda Python/R Distribution

JavaScript beautifier:
https://beautifier.io/

If you are not seeing the command line, follow this tutorial:
https://www.tenforums.com/tutorials/72024-open-command-window-here-add-windows-10-a.html

Table of Contents:
0:00 – Introduction
1:28 – Setting up Anaconda
3:00 – Installing Beautiful Soup
3:43 – Setting up urllib
6:07 – Retrieving the Web Page
10:47 – Evaluating Web Page
11:27 – Converting Listings into Line Items
16:13 – Using jsbeautiful
16:31 – Reading Raw HTML for Items to Scrape
18:34 – Building the Scraper
22:11 – Using the “findAll” Function
27:26 – Testing the Scraper
29:07 – Creating the .csv File
32:18 – End Result

Learn more about Data Science Dojo here:
https://datasciencedojo.com/data-science-bootcamp/

Watch the latest video tutorials here:

Data Science Tutorials

See what our past attendees are saying here:
https://datasciencedojo.com/bootcamp/reviews/#videos

Like Us: https://www.facebook.com/datasciencedojo
Follow Us: https://twitter.com/DataScienceDojo
Connect with Us: https://www.linkedin.com/company/datasciencedojo

Also find us on:
Instagram: https://www.instagram.com/data_science_dojo
Vimeo: https://vimeo.com/datasciencedojo

#webscraping #python #pythontutorial

source