CSCE 587 Home Page Syllabus Homework/Projects

CSCE 587 Big Data Analytics

Soln to Practice Midterm Exam

Lecture Materials

Jan. 16 Introduction and Introduction to R

Jan. 18 VM Terminal Sessions (or how to change your password and download datasets on the virtual machine)
Introduction to R and some wave data.

Jan. 23 Continuation of the introduction to R and the wave data we expect to use.

Jan. 25 Conclusion of Basic R followed by Intermediate R: data classes and type coercion considerations.

Jan. 30 Intermediate R: Strings, Matrices, Arrays followed by Functions. The CSV data that we expect to use starting from slide 34. Note: this csv file has contents that are different from the example in the slides.

Feb. 1 Conclusion of Intermediate R

Feb. 6 We will start with a generic presentation on clustering (first ~20 slides). We will then look closer at K-means clustering.
Time permitting we will begin the K-means lab along with some illustrative economic Data and the "classical" iris data set.

Feb. 8 A few last words on K-means clustering, then a brief discussion of how to submit HW via CSE dropbox.
We will then turn our attention to association analysis.

Feb. 13 Continuation of association analysis, followed by Speeding up association rule discovery

Feb. 15 We will hold off on the Association Rule Lab for a bit. Instead we will shift our focus to Simple linear regression

Feb. 20 We will start with a lab on association rules. This will be followed by the linear regression lab and some real estate data. Time permitting, we will continue with of Logistic Regression.

Feb. 22 We will start with a discussion of Logistic Regression. Time permitting, we will also have a logistic regression lab.

Feb. 27 We will begin with a discussion of hypothesis testing and then go over an example from the web. We will continue with a Naive Bayes lecture with general discussion on classification accuracy, holdout estimation, and ROC analysis.

Mar. 1 We will continue our discussion of Naive Bayes, followed by a general discussion on classification accuracy, holdout estimation, and ROC analysis. Finally, the Naive Bayes lab starts on slide 56. Data for Naive Bayes Lab.

Mar. 6 We will start class with the Naive Bayes lab starting on slide 56. We will then continue with a discussion of Decision Trees.

Mar. 8 Continuation of discussion of Decision Trees, followed by a Decision Tree Lab. Data for Decision Tree Lab. Time permitting we will discuss Principal Component Analysis

Mar. 20 Review for Midterm Exam. Example midterm from Fall 2017.

Class Meeting Times

Section Days Time Room
Lecture/Lab 001 T, Th 10:05am - 11:20am 1D15


Prof. John Rose
Office: M. Bert Storey Engineering and Inovation Center 2257
Office Phone:777-2405
Office Hours:T 3:30pm-5pm, W: 2:30pm-4pm and by appointment

Teaching Assistant

Ms. Xianshan Qu
Office: M. Bert Storey Engineering and Inovation Center 2242
Office Phone:777-XXXX
Office Hours:MW 2:00pm-3:30pm


No textbook will be required. Required readings will come from the big data/data mining/data analytics literature.


Ability to think

Grade breakdown

Homework & Projects30%
In-Class Labs20%

Grade ranges

A90 - 100B+86 - 89B80 - 85
C+76 - 79C70 - 75D+66 - 69
D60-65Fbelow 60
Grades will not be curved. You will receive the grade that you have earned. N.B. If you want to receive a passing grade, then you must earn it during the semester.

Resources you may find useful

R download for Linux, Mac OS, and Windows
R Studio download for linux, Mac OS, and Windows

Cheating Policy

Cheating is defined as giving or receiving unauthorized aid on an assignment, quiz, test or project, or not documenting an outside source of information should one be used. It is unacceptable and will not be tolerated. All offenses will be reported to the dean in accordance with the Carolina Community student handbook.

Academic sanctions are as follows. For the first offense a student will be docked twice the number of points that assignment is worth. So for example a student cheats on a 10 point quiz, that student will receive a -20 as a grade on that quiz. For the second offense the student will receive an F as a grade for the course.
Computer Science and Engineering University of South Carolina

If you have any questions or comments, please send me e-mail at:

CSCE 587 Home Page Syllabus Homework/Projects