ML Interview Prep | Your Path to ML/AI Engineer Roles

Implement and compare different regularization techniques to prevent overfitting in machine learning models.

Part 1: Implement L1 (Lasso) Regularization

import numpy as np

class LassoRegression:
    """
    Linear Regression with L1 (Lasso) Regularization
    
    Penalty: α * Σ|w|
    """
    def __init__(self, alpha=1.0, learning_rate=0.01, num_iterations=1000):
        """
        Args:
            alpha: Regularization strength (λ)
            learning_rate: Step size for gradient descent
            num_iterations: Number of training iterations
        """
        self.alpha = alpha
        self.learning_rate = learning_rate
        self.num_iterations = num_iterations
        self.weights = None
        self.bias = None
        
    def fit(self, X, y):
        """
        Train the model with L1 regularization
        
        TODO: Implement L1 regularized gradient descent
        - Loss = MSE + α * Σ|w|
        - Gradient for L1: sign(w)
        """
        pass
    
    def predict(self, X):
        """Make predictions"""
        pass

Part 2: Implement L2 (Ridge) Regularization

class RidgeRegression:
    """
    Linear Regression with L2 (Ridge) Regularization
    
    Penalty: α * Σw²
    """
    def __init__(self, alpha=1.0, learning_rate=0.01, num_iterations=1000):
        self.alpha = alpha
        self.learning_rate = learning_rate
        self.num_iterations = num_iterations
        self.weights = None
        self.bias = None
        
    def fit(self, X, y):
        """
        Train the model with L2 regularization
        
        TODO: Implement L2 regularized gradient descent
        - Loss = MSE + α * Σw²
        - Gradient for L2: 2 * α * w
        """
        pass
    
    def predict(self, X):
        """Make predictions"""
        pass

Part 3: Implement Elastic Net (L1 + L2)

class ElasticNet:
    """
    Linear Regression with Elastic Net Regularization
    
    Penalty: α₁ * Σ|w| + α₂ * Σw²
    Combines L1 and L2
    """
    def __init__(self, alpha_l1=0.5, alpha_l2=0.5, learning_rate=0.01, num_iterations=1000):
        """
        Args:
            alpha_l1: L1 regularization strength
            alpha_l2: L2 regularization strength
        """
        self.alpha_l1 = alpha_l1
        self.alpha_l2 = alpha_l2
        self.learning_rate = learning_rate
        self.num_iterations = num_iterations
        self.weights = None
        self.bias = None
        
    def fit(self, X, y):
        """
        Train with both L1 and L2 regularization
        
        TODO: Implement Elastic Net gradient descent
        - Loss = MSE + α₁ * Σ|w| + α₂ * Σw²
        - Gradient: sign(w) for L1 + 2*α₂*w for L2
        """
        pass
    
    def predict(self, X):
        """Make predictions"""
        pass

Part 4: Compare All Methods

def compare_regularization(X_train, y_train, X_test, y_test):
    """
    Compare different regularization methods
    
    TODO: Train all 3 models and compare:
    - Training error
    - Test error
    - Number of zero weights (sparsity)
    - Weight magnitudes
    """
    pass

Hints

Hint 1

L1 vs L2 Gradient:

L1 (Lasso):

Gradient = sign(w) = {
  +1 if w > 0
  -1 if w < 0
   0 if w = 0
}

L2 (Ridge):

Gradient = 2 * α * w

Key difference: L1 uses sign, L2 uses actual value!

Hint 2

Handling L1's Non-differentiability:

L1 is not differentiable at w=0. Use soft thresholding or subgradient:

# Soft thresholding approach
def soft_threshold(w, lambda_):
    if w > lambda_:
        return w - lambda_
    elif w < -lambda_:
        return w + lambda_
    else:
        return 0

Hint 3

Weight Update Formula:

Standard gradient descent:

w = w - lr * gradient

With L1:

w = w - lr * (∂MSE/∂w + α * sign(w))

With L2:

w = w - lr * (∂MSE/∂w + 2 * α * w)

Regularization: L1, L2, and Beyond

Question