Overview
Part 1 covers the foundations of machine learning using Scikit-Learn. It focuses on classical ML algorithms, data preprocessing, model evaluation, and building production-ready pipelines.
Chapters
Chapter 1: The Machine Learning Landscape
- What is Machine Learning?
- Types of ML Systems
- Main Challenges of ML
- Testing and Validating
Chapter 2: End-to-End Machine Learning Project
- Working with Real Data
- Data Exploration and Visualization
- Preparing Data for ML Algorithms
- Select and Train a Model
- Fine-Tune Your Model
Chapter 3: Classification
- MNIST Dataset
- Training a Binary Classifier
- Performance Measures
- Multiclass Classification
- Error Analysis
- Multilabel and Multioutput Classification
Chapter 4: Training Models
- Linear Regression
- Gradient Descent
- Polynomial Regression
- Regularized Linear Models (Ridge, Lasso, Elastic Net)
- Logistic Regression
Chapter 5: Support Vector Machines
- Linear SVM Classification
- Nonlinear SVM Classification
- SVM Regression
- Under the Hood
Chapter 6: Decision Trees
- Training and Visualizing Decision Trees
- Making Predictions
- Estimating Class Probabilities
- CART Training Algorithm
- Regularization Hyperparameters
- Regression
- Instability
Chapter 7: Ensemble Learning and Random Forests
- Voting Classifiers
- Bagging and Pasting
- Random Forests
- Boosting (AdaBoost, Gradient Boosting)
- Stacking
Chapter 8: Dimensionality Reduction
- The Curse of Dimensionality
- Main Approaches for Dimensionality Reduction
- PCA
- Kernel PCA
- LLE, MDS, Isomap, t-SNE
Chapter 9: Unsupervised Learning Techniques
- Clustering (K-Means, DBSCAN, etc.)
- Gaussian Mixtures
- Anomaly Detection
- Novelty Detection
Learning Goals
By the end of Part 1, I should be able to:
- Build end-to-end ML pipelines with Scikit-Learn
- Choose appropriate algorithms for different problem types
- Evaluate and fine-tune models effectively
- Handle common ML challenges (overfitting, underfitting, curse of dimensionality)
- Apply ensemble methods to improve predictions
- Perform dimensionality reduction and clustering
Practice Repository
All code implementations and experiments for Part 1 are tracked in:
This repo contains:
- Chapter-by-chapter implementations
- Extended experiments beyond book examples
- Custom datasets and challenges
- Notes on gotchas and best practices
Key Takeaways
(To be filled as I progress through the chapters)