Table of Contents

This notebook demostrates how you can perform Kernel Regression manually in python. While Statsmodels provides a library for Kernel Regression, doing Kernel regression by hand can help us better understand how we get to the find result.

First I will show how Kernel Regression is done using Statsmodels. Next I will show how it is done by hand, then finally overlay both plots to show that the results are the same.

To begin with, lets looks at Kernel regression by Statsmodels

Kernal Regression by Statsmodels

We generate y values by using a lambda function. You can change the lambda function around to see what happens. The x values i.e the independent variable is controlled by new_x where we have displaced the x value to show that you can have

Generating Fake Data

Let us plot the data. We are going to be using Plotly express through this article for all the plotting.

Our goal is to fit a curve the above data point using regression. How can we go about it? Using statsmodels it is fairly simple.

Output of Kernal Regression

The output of kernel regression in Statsmodels non parametric regression module are two arrays.
1) The predicted y values
2) The Marginal Effects

The marginal effects are essentially the first derivative of the predicted value with respect to the independent variable for a univariate regression problem. More on marginal effects can be found here.

Kernel regression by Hand in Python

To do Kernel regression by hand we need to understand a few things. First, here are some of the properties of the kernel.

1) The Kernel is symmetric i.e

$$ K(x) = K(-x)$$

2) Area under the Kernel function is equal to 1 meaning $$\int\limits_{-\infty}^{\infty} K(x) dx = 1 $$

We are going to use a Gaussian kernel to solve this problem. The Gaussian kernel has the form:

$$K(x) \dfrac{1}{b \sqrt{2\pi}} e^{-\big(\dfrac{x - x_i}{2b}\big)^2}$$

Where b is the bandwidth, $x_i$ are the points from the dependent variable and $x$ is the range of values over which we define the kernel function. In our case $x_i$ comes from new_x

Step 1: Calculate the Kernel for a single input x point

We want to display the dataframe for a single point xi.

Visualizing the Kernels for all the input x points

We want to visual the kernel $K(x)$ for each $x_i$. Below we calculate the kernel function value and store them in a dictionary called kernel_fns which is converted to a dataframe kernels_df. We then use Plotly express to plot each kernel function.

Step 2: Calculate the weights for each input x value

We will need to calculate the weight for a single input. The weight is calculated using the expression below:

$$w(x, x_i) =\dfrac{K(x- x_i)}{\sum\limits_{j=1}^{n} K(x_j)}$$

The above equation represents the weights for the $i^th$ element of new_x where $x$ are all the elements of new_x. The denominator is summed over all the points in new_x. What is interesting to note here is that you are going to be using the kernels for all input points to calculate the weights. The equation above essentially scales weights between 0 and 1.

The equation above has been implemented in the function weights which gives us the weights for a single input point. . The function takes a single input point and gives us a row of weights. It does this by looping over all the input points while implementing the above equation.

Step 3: Calculate the y pred value for a single input point

We get the predicted value for the $i_{th}$ point from :

$$\hat{y}_{i} = y_1 w_{i1} + y_2 w_{i2} + y_3 w_{i3} +...+ y_n w_{in}$$

This equation is implemented in the function single_y_pred. We take a dot product of the row of weights we get from the weights function and the y values from our fake data. The equation above represents that dot product.

Step 4: Calculate the y pred values for all the input points

The code below loops over all the input points, calculates the predicted values and appends them to Y_pred. Once we have the predicted values all we need to do now is to visualize them.

Step 5: Visualize the difference between the two methods

Now that we have acquired the predicted value by calculating the predicted values manually, we can compare our regression curve to the one we get from statsmodels. We overlap the fits on top of each other and fit that they match perfectly.

Conclusion

This article shows how we can understand the inner workings of the kernel regression algorithm with a simple example that uses generated data. If you learnt something for this article do like and share this article.

Thank you for reading!

References