Go to USC home page USC Logo CSCE 510 System Programming
UNIVERSITY OF SOUTH CAROLINA
Course Home Page | USC Acdemic Calendar | CSE Dropbox | CSE Secure Site | Exam Schedule

Schedule

Lectures

Handouts

Homework


Resources

Department

College of Engr.

University Home Page

Library USCAN
USC  THIS SITE

Department of Computer Science and Engineering

CSCE 590 sec 2: Web Scraping

Manton M. Matthews


General Information

DESCRIPTION:

Instructor

    Manton M . Matthews
    3A53 Swearingen
    Phone: 777-3285
    Office Hours: TH 11:30-1:00PM, others by appointment
     Email: mm at sc in the domain edu

Teaching Assistant

  • None

Course Description: This course will cover Web scraping with Python, POSIX regular expressions, the libraries Beautiful Soup and the Natural Language Toolkit (NLTK), and the Selenium (Web Driver) for automating the process of interacting with browsers and finally the Scrapy library for creating web crawlers.

Main text and References

    Required: Web Scraping with Python - Collecting Datafrom the Modern Web by Ryan Mitchell, O'Reilly 2015.

    Resources: Websites or texts with online versions or substitutes.

  1. Python 3.5 documentation - https://docs.python.Org/3.5/
    1. Tutorial (Required) https://docs.python.Org/3.5/tutorial/index.html
    2. The Standard Library https://docs.python.Org/4.5/librarv/index.html
  2. Natural Language Toolkit - for Python 3.x http://www.nltk.org/book/ and for Python2.x http://www.nltk.org/book led/
  3. Scrapy - https://doc.scrapy.org/en/latest/intro/tutorial.html
  4. Requests -

Learning Outcomes : At the end of the course the students should have demonstrated the ability to:

  1. Open URLs and extract desired information using several tools including: regular-expressions, BeautifulSoup, Requests, and the Natural Language Toolkit (NLTK.)
  2. Clean data and save the data gathered in databases.
  3. Automatically drive browsers with Selenium for testing websites and for gathering of data.
  4. Login to websites to gather information including handling CAPTCHAs.
  5. Create web crawlers by hand and with Scrapy.

Date Significance
February 16 Test 1
Thursday March 2, Last day to withdrawal without WF
April 6 Test 2
Tuesday May 2 @ 9:00 a.m. Final Exam

Link to the Exam Schedule for Spring 2017

Policies

Homework:
The homework is submitted through the "dropbox" system on the CSE secure site. All Homework is to be turned in as ASCII files, i.e. no "word documents." No late homework or projects will be accepted. All Homework is expected to be individual work unless explicitly specified otherwise.

Academic Integrity: You are expected to practice the highest possible standards of academic integrity. Any deviation from this expectation will result in a minimum academic penalty of your failing the assignment, and will result in additional disciplinary measures including referring you to the Office of Academic Integrity. Violations of the University's Honor Code include, but are not limited to improper citation of sources, using another student's work, and any other form of academic misrepresentation. For more information, please see the Honor Code.

Accommodating Disabilities: Reasonable accommodations are available for students with a documented disability. If you have a disability and may need accommodations to fully participate in this class, contact the Office of Student Disability Services: 777-6142, TDD 777-6744, email sasds@mailbox.sc.edu, or stop by LeConte College Room 112A. All accommodations must be approved through the Office of Student Disability Services.

Amending the Syllabus/Rules Amendments and changes to the syllabus, including evaluation and grading mechanisms, are possible. The instructor must initiate any changes. Changes to the grading and evaluation scheme will be voted on by the entire class and approved only with unanimous vote of all students present in class on the day the issue is decided. The lecture schedule and reading assignments (daily schedule) will not require a vote and may be altered at the instructor's discretion. Once approved, amendments will be distributed in writing to all students.

Grading policy:
The final grade will be based on two midterms, assignments and the final exam, according to the following weights:

  • Assignments and Quizzes: 35%
  • Two Tests: 20% each
  • Final: 25%
RETURN TO TOP
USC LINKS: DIRECTORY MAP EVENTS VIP
SITE INFORMATION