Spring 2025 Introduction to Data Science

Lectures: Tuesday 2:00-3:20 and Thursday 2-3:20pm in BE 100 (Livingston)

Instructor: Ruixiang Tang

Recitation: 

Section 05: Wednesday 2:00 PM - 3:20 PM LSH-B117, Student Instructor: Wujiang Xu <wujiang.xu@rutgers.edu>

Section 06: Wednesday 3:50 PM - 5:10 PM LSH-B117, Student Instructor: Mingyu Jin <mj939@scarletmail.rutgers.edu>

Section 07: Wednesday7:30 PM - 8:50 PM LSH-B117, Student Instructor: Sasank Chindirala <sc2767@scarletmail.rutgers.edu>

Section 08: Thursday12:10 PM - 1:30 PM LSH-A143, Student Instructor: Yashshree Kirad <ypk13@scarletmail.rutgers.edu>

Office Hour: 

Office Hours: Section 05 Tuesday 9am-10am (see zoom link on canvas)

Office Hours: Section 06 Tuesday 10am-11am (see zoom link on canvas)

Office Hours: Section 07 Wednesday 10am-11am (see zoom link on canvas)

Office Hours: Section 08 Tuesday 10am-11am (see zoom link on canvas)

Image generated by DALL.E 3 with prompt "Introduction to Data Science"

Course Overview

CS 439 Spring 2025 introduces foundational concepts and practical techniques in data science, equipping students with skills in data manipulation, visualization, and advanced machine learning methods. The course covers key topics such as data preprocessing, statistical analysis, regression, classification, clustering, recommender systems, and an introduction to deep learning and large language models (LLMs). Students will gain hands-on experience with tools like Python, Pandas, Seaborn, and TensorFlow through five labs, five quizzes, and two major projects. With a focus on real-world applications, this course emphasizes understanding mathematical foundations, dimensionality reduction, anomaly detection, and modern AI techniques, preparing students for practical and research-driven careers in data science.

Grading


Learning Resources

Course Videos on CUbits: https://www.cubits.ai/collections/48/ 


Course Schedule (tentative)

Week#

Title

Topics

Notes

Week 1

Introduction to Data Science

Course Overview and Introduction

Environment Setup and Tools

Introduction to Python for Data Science

Tools: Pandas, NumPy

Lab: Environment setup and basic Python

Resources: Pandas cheat sheet, Plot tutorial

Week 2

Data Fundamentals

Data Manipulation with Pandas Data Collection and Web Scraping Data Quality and Preparation

Tools: BeautifulSoup 

Lab 1 Released 

Quiz 1 

Recitation: Python and Pandas fundamentals

Week 3

Data Processing and Text Analysis

Advanced Data Collection

Text Data Processing

Data Preprocessing Techniques

Lab 1 Due

Text analysis techniques

Data transformation methods

Recitation: Data manipulation practice

Week 4

Data Visualization and Analysis

Data Types and Visualization Techniques

Exploratory Data Analysis

Basic Statistical Analysis

Lab 2 Released

Quiz 2

Tools: Seaborn, Matplotlib

Recitation: Visualization techniques

Week 5

Mathematical Foundations

Visualization with Kernel Density Estimators

Linear Algebra for Data Science

Statistical Foundations

Lab 2 Due

Mathematical concepts for ML

Recitation: Linear algebra applications

Week 6

Dimensionality Reduction

Matrix Decompositions

SVD and PCA

Feature Engineering Basics

Lab 3 Released 

Advanced mathematical concepts 

Recitation: SVD and PCA practice

Week 7

Probability and Statistics

Probability Fundamentals Distributions and MLE 

Naïve Bayes Classification

Quiz 3 

Statistical modeling concepts Recitation: Probability and statistics practice

Week 8

Review and Assessment

Course Review 

Midterm Examination

Midterm Exam 

Review of key concepts 

Recitation: Exam preparation

Week 9

Regression Analysis

Linear Regression 

Gradient Descent 

Feature Engineering

Lab 3 Due 

Mid-Project Released 

Quiz 4 

Recitation: Regression practice

Week 10

Classification Techniques

Classification Fundamentals Logistic Regression Advanced Classification Methods

Support Vector Machines 

Neural Networks introduction Recitation: Classification practice

Week 11

Advanced Classification

Multi-class Classification Ensemble Methods Class Imbalance Problems

Mid-Project Due 

Lab 4 Released 

Advanced classification techniques 

Recitation: Advanced classification practice

Week 12

Unsupervised Learning

Clustering Analysis

Dimensionality Reduction

Anomaly Detection

Clustering algorithms

Outlier detection methods

Pattern recognition

Week 13

Recommender Systems

Recommendation Algorithms

Collaborative Filtering 

Content-Based Filtering

Lab 4 Due

Lab 5 Released

Quiz 5

Recitation: Recommender systems practice

Week 14

Deep Learning and LLMs

Introduction to Deep Learning Neural Networks 

Large Language Models in Data Science

Applications in data labeling 

Data generation techniques 

Final project discussions