Spring 2025 Introduction to Data Science
Lectures: Tuesday 2:00-3:20 and Thursday 2-3:20pm in BE 100 (Livingston)
Instructor: Ruixiang Tang
Recitation:
Section 05: Wednesday 2:00 PM - 3:20 PM LSH-B117, Student Instructor: Wujiang Xu <wujiang.xu@rutgers.edu>
Section 06: Wednesday 3:50 PM - 5:10 PM LSH-B117, Student Instructor: Mingyu Jin <mj939@scarletmail.rutgers.edu>
Section 07: Wednesday7:30 PM - 8:50 PM LSH-B117, Student Instructor: Sasank Chindirala <sc2767@scarletmail.rutgers.edu>
Section 08: Thursday12:10 PM - 1:30 PM LSH-A143, Student Instructor: Yashshree Kirad <ypk13@scarletmail.rutgers.edu>
Office Hour:
Office Hours: Section 05 Tuesday 9am-10am (see zoom link on canvas)
Office Hours: Section 06 Tuesday 10am-11am (see zoom link on canvas)
Office Hours: Section 07 Wednesday 10am-11am (see zoom link on canvas)
Office Hours: Section 08 Tuesday 10am-11am (see zoom link on canvas)
Image generated by DALL.E 3 with prompt "Introduction to Data Science"
Course Overview
CS 439 Spring 2025 introduces foundational concepts and practical techniques in data science, equipping students with skills in data manipulation, visualization, and advanced machine learning methods. The course covers key topics such as data preprocessing, statistical analysis, regression, classification, clustering, recommender systems, and an introduction to deep learning and large language models (LLMs). Students will gain hands-on experience with tools like Python, Pandas, Seaborn, and TensorFlow through five labs, five quizzes, and two major projects. With a focus on real-world applications, this course emphasizes understanding mathematical foundations, dimensionality reduction, anomaly detection, and modern AI techniques, preparing students for practical and research-driven careers in data science.
Grading
5 Quizzes
5 Labs
3 Research Papers (also research paper-based quizzes)
Midterm Exam
Final Semester Project
Course Schedule (tentative)
Week#
Title
Topics
Notes
Week 1
Introduction to Data Science
Course Overview and Introduction
Environment Setup and Tools
Introduction to Python for Data Science
Tools: Pandas, NumPy
Lab: Environment setup and basic Python
Resources: Pandas cheat sheet, Plot tutorial
Week 2
Data Fundamentals
Data Manipulation with Pandas Data Collection and Web Scraping Data Quality and Preparation
Tools: BeautifulSoup
Lab 1 Released
Quiz 1
Recitation: Python and Pandas fundamentals
Week 3
Data Processing and Text Analysis
Advanced Data Collection
Text Data Processing
Data Preprocessing Techniques
Lab 1 Due
Text analysis techniques
Data transformation methods
Recitation: Data manipulation practice
Week 4
Data Visualization and Analysis
Data Types and Visualization Techniques
Exploratory Data Analysis
Basic Statistical Analysis
Lab 2 Released
Quiz 2
Tools: Seaborn, Matplotlib
Recitation: Visualization techniques
Week 5
Mathematical Foundations
Visualization with Kernel Density Estimators
Linear Algebra for Data Science
Statistical Foundations
Lab 2 Due
Mathematical concepts for ML
Recitation: Linear algebra applications
Week 6
Dimensionality Reduction
Matrix Decompositions
SVD and PCA
Feature Engineering Basics
Lab 3 Released
Advanced mathematical concepts
Recitation: SVD and PCA practice
Week 7
Probability and Statistics
Probability Fundamentals Distributions and MLE
Naïve Bayes Classification
Quiz 3
Statistical modeling concepts Recitation: Probability and statistics practice
Week 8
Review and Assessment
Course Review
Midterm Examination
Midterm Exam
Review of key concepts
Recitation: Exam preparation
Week 9
Regression Analysis
Linear Regression
Gradient Descent
Feature Engineering
Lab 3 Due
Mid-Project Released
Quiz 4
Recitation: Regression practice
Week 10
Classification Techniques
Classification Fundamentals Logistic Regression Advanced Classification Methods
Support Vector Machines
Neural Networks introduction Recitation: Classification practice
Week 11
Advanced Classification
Multi-class Classification Ensemble Methods Class Imbalance Problems
Mid-Project Due
Lab 4 Released
Advanced classification techniques
Recitation: Advanced classification practice
Week 12
Unsupervised Learning
Clustering Analysis
Dimensionality Reduction
Anomaly Detection
Clustering algorithms
Outlier detection methods
Pattern recognition
Week 13
Recommender Systems
Recommendation Algorithms
Collaborative Filtering
Content-Based Filtering
Lab 4 Due
Lab 5 Released
Quiz 5
Recitation: Recommender systems practice
Week 14
Deep Learning and LLMs
Introduction to Deep Learning Neural Networks
Large Language Models in Data Science
Applications in data labeling
Data generation techniques
Final project discussions