Python, NumPy and MatPlotLib: Curve Fitting Using PolyFit and PolyVal

Perquisites

For this we'll require, numpy and matplot lib:

Python

Raw Data

Let's create linearly spaced x data from 0 to 21 in steps of 2 using the arange function and from this create a quadratic (2nd order polynomial) with coefficients 0.1, 1.2 and 5:

Python

This can be done using the code above recalling that n*x is multiplication of n by x and x**n is x to the power of n.

Lets go ahead and plot this data as a green scatter plot:

Python

Fitting a Polynomial using the Function polyfit

Let's try and fit this data to a 1st order polynomial otherwise known as a straight line. The equation for a straight line is:

\displaystyle y={{a}_{0}}x+{{a}_{1}}

For a first polynomial, we require two coefficients, the first coefficient a0 is called the gradient and the second coefficient a1 is called the constant.

To fit this we can use the function polyfit which is called from the numpy library. The output arguments are the polynomial coefficients pn and n is the order:

Python

In our case:

Python
[ 3.2 -1. ]

We can also explicitly specify our output arguments using:

Python
3.1999999999999997

-1.0000000000000078

Now let's try and fit using a 2nd order polynomial otherwise known as a quadratic function. Here the equation is:

\displaystyle y={{b}_{0}}{{x}^{2}}+{{b}_{1}}x+{{b}_{2}}

This time we require 3 polynomial coefficients, one for the second order term, one for the first order term and one for the constant:

Python
[0.1 1.2 5. ]

Once again we can explicitly state the polynomial coefficients as outputs:

Python
5.000000000000009

1.1999999999999986

0.10000000000000006

We can go a step further and fit to a 3rd order polynomial also known as a cubic function. Here we require 4 coefficients, one for the third order term, one for the second order term, one for the first order term and one for the constant:

\displaystyle y={{c}_{0}}{{x}^{3}}+{{c}_{1}}{{x}^{2}}+{{c}_{2}}x+{{c}_{3}}

Python
[1.77360529e-17 1.00000000e-01 1.20000000e+00 5.00000000e+00]

Note the extremely low magnitude in the 3rd order term, meaning this data has effectively fitted to a 2nd order polynomial opposed to a 3rd order polynomial.

Evaluating a Polynomial at a DataPoint Using the Function polyval

We can evaluate the point at say x=1 using the function polyval. The inputs to this function are the polynomial coefficients as the datapoints:

Python

We can try the linear fit with an x value of 1

Python
2.199999999999992

This point can be plotted:

Python

As we can see it is quite far out:

If we want multiple points we can input a vector:

Python
2.199999999999992

8.599999999999993

Once again we can plot these datapoints:

Python

As we can see the first point is quite far out but the second point is closer to what is expected:

Evaluating a Polynomial at a Series of DataPoints Using the Function polyval

For larger datasets it is easier if we don't explicitly state the polynomial coefficients individually. Recall when specifying the output of polyfit without square brackets we got a vector a opposed to the individual a0 and a1 when we used square brackets. We can input this value a as the first input to polyval i.e. can replace [a0,a1].

Python

Next instead of explicitly stating the x values, we can create a vector of odd values using:

Python

Now we can calculate the vector ylin using the vector x2 as an input argument:

Python

We can once again plot these:

Python

Now with the linear fit we can see the values don't exactly match up, this is expected as we are under-fitting a quadratic function (2nd order polynomial) with a linear fit (first order polynomial).

We can now try the quadratic (second order polynomial) coefficients using the same odd x values x2:

Python

We can once again plot this:

Python

Subplots of Multiple Fits

In general we would plot the fit as a line and not as a scatter. Let's create a plot with a subplot comparing linear (1st order), quadratic (2nd order), cubic (3rd order) and 4th order polynomial fits.

Python

Using a DataFrame

It can also be quite useful to store the data and fitted data in a dataframe (table):

Python

Evaluating your Fit using statsmodels

So far nothing much is said about the goodness of fit of the polynomial. To get more information, we can install a third party library called statmodels. Close down Spyder and open the Anaconda Powershell Prompt. Type in:

Python

Press y to proceed:

To use the stats models you will need to import the statsmodels formula api as smf (this is normally done at the top of your script):

Python

We need the Original Data DataFrame from above as well as the Polynomial Coefficients:

Python

Now we need to create a polynomial function from the coefficients measured using polyfit, in this case f and load this formula in smf.ols (stats models formula – ordinary least squares) specifying the output y1 being this function f of x1. We need to specify the data as a dataframe and then specify that we want to fit.

In the background this will use the original x1 data to fit the polynomial using the coefficients given from the 1st order polyfit of the x1, y1 data. It will then use least squares on the calculated y values comparing them with the ordinary original y values:

Python
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                     y1   R-squared:                       0.970
Model:                            OLS   Adj. R-squared:                  0.967
Method:                 Least Squares   F-statistic:                     295.4
Date:                Wed, 07 Aug 2019   Prob (F-statistic):           3.44e-08
Time:                        21:54:52   Log-Likelihood:                -29.491
No. Observations:                  11   AIC:                             62.98
Df Residuals:                       9   BIC:                             63.78
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept  -1.776e-14      2.154  -8.25e-15      1.000      -4.873       4.873
model(x1)      1.0000      0.058     17.187      0.000       0.868       1.132
==============================================================================
Omnibus:                        1.593   Durbin-Watson:                   0.385
Prob(Omnibus):                  0.451   Jarque-Bera (JB):                1.107
Skew:                           0.566   Prob(JB):                        0.575
Kurtosis:                       1.936   Cond. No.                         67.8

Here the Adjusted R squared is 0.967 which is close to 1 indicating a good fit to the linear coefficients, however we can try the second order polynomial instead. Unsurprisingly the Adjusted R squared is now exactly 1.000 as this is a model polynomial with these exact coefficients.

Python
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                     y1   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 8.684e+31
Date:                Wed, 07 Aug 2019   Prob (F-statistic):          9.61e-141
Time:                        22:08:36   Log-Likelihood:                 343.54
No. Observations:                  11   AIC:                            -683.1
Df Residuals:                       9   BIC:                            -682.3
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept  -3.553e-15   3.99e-15     -0.890      0.397   -1.26e-14    5.48e-15
f(x1)          1.0000   1.07e-16   9.32e+15      0.000       1.000       1.000
==============================================================================
Omnibus:                        2.870   Durbin-Watson:                   1.359
Prob(Omnibus):                  0.238   Jarque-Bera (JB):                0.466
Skew:                          -0.123   Prob(JB):                        0.792
Kurtosis:                       3.977   Cond. No.                         67.4
==============================================================================

Note over-fitting will also give an adjusted R squared of 1:

Python
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                     y1   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 3.292e+31
Date:                Wed, 07 Aug 2019   Prob (F-statistic):          7.56e-139
Time:                        22:11:41   Log-Likelihood:                 338.20
No. Observations:                  11   AIC:                            -672.4
Df Residuals:                       9   BIC:                            -671.6
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   1.421e-14   6.48e-15      2.192      0.056   -4.52e-16    2.89e-14
f(x1)          1.0000   1.74e-16   5.74e+15      0.000       1.000       1.000
==============================================================================
Omnibus:                       15.376   Durbin-Watson:                   0.888
Prob(Omnibus):                  0.000   Jarque-Bera (JB):                9.005
Skew:                           1.784   Prob(JB):                       0.0111
Kurtosis:                       5.629   Cond. No.                         67.4
Python
==============================================================================
Dep. Variable:                     y1   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 4.932e+31
Date:                Wed, 07 Aug 2019   Prob (F-statistic):          1.23e-139
Time:                        22:12:28   Log-Likelihood:                 340.43
No. Observations:                  11   AIC:                            -676.9
Df Residuals:                       9   BIC:                            -676.1
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   1.421e-14    5.3e-15      2.684      0.025    2.23e-15    2.62e-14
f(x1)          1.0000   1.42e-16   7.02e+15      0.000       1.000       1.000
==============================================================================
Omnibus:                        0.908   Durbin-Watson:                   0.593
Prob(Omnibus):                  0.635   Jarque-Bera (JB):                0.629
Skew:                          -0.057   Prob(JB):                        0.730
Kurtosis:                       1.834   Cond. No.                         67.4
==============================================================================

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.