Python, NumPy and MatPlotLib: Curve Fitting Using PolyFit and PolyVal

Table of contents

Perquisites
Raw Data

Fitting a Polynomial using the Function polyfit
Evaluating a Polynomial at a DataPoint Using the Function polyval
Evaluating a Polynomial at a Series of DataPoints Using the Function polyval

Subplots of Multiple Fits
Using a DataFrame
Evaluating your Fit using statsmodels

Perquisites

For this we'll require, numpy and matplot lib:

import numpy as np
import matplotlib.pyplot as plt
import scipy as sp
import pandas as pd

Raw Data

Let's create linearly spaced x data from 0 to 21 in steps of 2 using the arange function and from this create a quadratic (2nd order polynomial) with coefficients 0.1, 1.2 and 5:

x1=np.arange(start=0,stop=21,step=2)
y1=0.1*x1**2+1.2*x1+5

This can be done using the code above recalling that n*x is multiplication of n by x and x**n is x to the power of n.

Lets go ahead and plot this data as a green scatter plot:

plt.close('all')
plt.figure(1)
plt.grid(axis='both',which='major',color=[166/255,166/255,166/255], linestyle='-', linewidth=1)
plt.minorticks_on()
plt.grid(axis='both',which='minor',color=[166/255,166/255,166/255,0.4], linestyle=':', linewidth=0.5,label='original data')
plt.scatter(x1,y1,150,color=[0/255,176/255,80/255,0.4])
plt.xlabel('x')
plt.ylabel('y')
plt.legend()

Fitting a Polynomial using the Function polyfit

Let's try and fit this data to a 1st order polynomial otherwise known as a straight line. The equation for a straight line is:

$\displaystyle y={{a}_{0}}x+{{a}_{1}}$

For a first polynomial, we require two coefficients, the first coefficient a0 is called the gradient and the second coefficient a1 is called the constant.

To fit this we can use the function polyfit which is called from the numpy library. The output arguments are the polynomial coefficients pn and n is the order:

pn=np.polyfit(x,y,n)

In our case:

a=np.polyfit(x1,y1,1)
print(a)

[ 3.2 -1. ]

We can also explicitly specify our output arguments using:

[a0,a1]=np.polyfit(x1,y1,1)
print(a0)
print(a1)

3.1999999999999997

-1.0000000000000078

Now let's try and fit using a 2nd order polynomial otherwise known as a quadratic function. Here the equation is:

$\displaystyle y={{b}_{0}}{{x}^{2}}+{{b}_{1}}x+{{b}_{2}}$

This time we require 3 polynomial coefficients, one for the second order term, one for the first order term and one for the constant:

b=np.polyfit(x1,y1,2)
print(b)

[0.1 1.2 5. ]

Once again we can explicitly state the polynomial coefficients as outputs:

[b0,b1,b2]=np.polyfit(x1,y1,2)
print(b0)
print(b1)
print(b2)

5.000000000000009

1.1999999999999986

0.10000000000000006

We can go a step further and fit to a 3rd order polynomial also known as a cubic function. Here we require 4 coefficients, one for the third order term, one for the second order term, one for the first order term and one for the constant:

$\displaystyle y={{c}_{0}}{{x}^{3}}+{{c}_{1}}{{x}^{2}}+{{c}_{2}}x+{{c}_{3}}$

c=np.polyfit(x1,y1,3)
print(c)

[1.77360529e-17 1.00000000e-01 1.20000000e+00 5.00000000e+00]

Note the extremely low magnitude in the 3rd order term, meaning this data has effectively fitted to a 2nd order polynomial opposed to a 3rd order polynomial.

Evaluating a Polynomial at a DataPoint Using the Function polyval

We can evaluate the point at say x=1 using the function polyval. The inputs to this function are the polynomial coefficients as the datapoints:

newy=polyval(pn,newx);

We can try the linear fit with an x value of 1

[yat1lin]=np.polyval([a0,a1],1)
print(yat1lin)

2.199999999999992

This point can be plotted:

plt.figure(2)
plt.grid(axis='both',which='major',color=[166/255,166/255,166/255], linestyle='-', linewidth=1)
plt.minorticks_on()
plt.grid(axis='both',which='minor',color=[166/255,166/255,166/255,0.4], linestyle=':', linewidth=0.5)
plt.scatter(x1,y1,150,color=[0/255,176/255,80/255,0.4],label='original data')
plt.scatter(1,2.199999999999992,150,color=[112/255,48/255,160/255,0.4],label='ylin at x=1')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()

As we can see it is quite far out:

If we want multiple points we can input a vector:

[yat1lin,yat3lin]=np.polyval([a0,a1],[1,3])
print(yat1lin)
print(yat3lin)

2.199999999999992

8.599999999999993

Once again we can plot these datapoints:

plt.figure(3)
plt.grid(axis='both',which='major',color=[166/255,166/255,166/255], linestyle='-', linewidth=1)
plt.minorticks_on()
plt.grid(axis='both',which='minor',color=[166/255,166/255,166/255,0.4], linestyle=':', linewidth=0.5)
plt.scatter(x1,y1,150,color=[0/255,176/255,80/255,0.4],label='original data')
plt.scatter(1,2.199999999999992,150,color=[112/255,48/255,160/255,0.4],label='ylin at x=1')
plt.scatter(3,8.599999999999993,150,color=[112/255,48/255,160/255,0.4],label='ylin at x=3')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()

As we can see the first point is quite far out but the second point is closer to what is expected:

Evaluating a Polynomial at a Series of DataPoints Using the Function polyval

For larger datasets it is easier if we don't explicitly state the polynomial coefficients individually. Recall when specifying the output of polyfit without square brackets we got a vector a opposed to the individual a0 and a1 when we used square brackets. We can input this value a as the first input to polyval i.e. can replace [a0,a1].

[yat1lin,yat3lin]=np.polyval(a,[1,3])

Next instead of explicitly stating the x values, we can create a vector of odd values using:

x2=np.arange(start=1,stop=21,step=2)

Now we can calculate the vector ylin using the vector x2 as an input argument:

ylin=np.polyval(a,x2)

We can once again plot these:

plt.figure(4)
plt.grid(axis='both',which='major',color=[166/255,166/255,166/255], linestyle='-', linewidth=1)
plt.minorticks_on()
plt.grid(axis='both',which='minor',color=[166/255,166/255,166/255,0.4], linestyle=':', linewidth=0.5)
plt.scatter(x1,y1,150,color=[0/255,176/255,80/255,0.4],label='original data')
plt.scatter(x2,ylin,150,color=[112/255,48/255,160/255,0.4],label='linear fit')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()

Now with the linear fit we can see the values don't exactly match up, this is expected as we are under-fitting a quadratic function (2nd order polynomial) with a linear fit (first order polynomial).

We can now try the quadratic (second order polynomial) coefficients using the same odd x values x2:

yquad=np.polyval(b,x2)

We can once again plot this:

plt.figure(5)
plt.grid(axis='both',which='major',color=[166/255,166/255,166/255], linestyle='-', linewidth=1)
plt.minorticks_on()
plt.grid(axis='both',which='minor',color=[166/255,166/255,166/255,0.4], linestyle=':', linewidth=0.5)
plt.scatter(x1,y1,150,color=[0/255,176/255,80/255,0.4],label='original data')
plt.scatter(x2,yquad,150,color=[112/255,48/255,160/255,0.4],label='quadratic fit')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()

Subplots of Multiple Fits

In general we would plot the fit as a line and not as a scatter. Let's create a plot with a subplot comparing linear (1st order), quadratic (2nd order), cubic (3rd order) and 4th order polynomial fits.

#Create Data
x1=np.arange(start=0,stop=21,step=2)
y1=0.1*x1**2+1.2*x1+5
#New X Data
x2=np.arange(start=1,stop=21,step=2)
#Finding Coefficients
a=np.polyfit(x1,y1,1)
b=np.polyfit(x1,y1,2)
c=np.polyfit(x1,y1,3)
d=np.polyfit(x1,y1,4)
#Evaluating New ydata
ylin=np.polyval(a,x2)
yquad=np.polyval(b,x2)
ycubic=np.polyval(c,x2)
y4power=np.polyval(d,x2)

#Fit Equations
ylineqn='ylin=('+str(round(a[0],3))+')*x+('+str(round(a[1],3))+')'
yquadeqn='yquad=('+str(round(b[0],3))+')*x^2+('+str(round(b[1],3))+')*x+('+str(round(b[2],3))+')'
ycubiceqn='ycubic=('+str(round(c[0],3))+')*x^3+('+str(round(c[1],3))+')*x^2+('+str(round(c[2],3))+')*x+('+str(round(c[3],3))+')'
ypoly4eqn='ypoly4=('+str(round(d[0],3))+')*x^4+('+str(round(d[1],3))+')*x^3+('+str(round(d[2],3))+')*x^2+('+str(round(d[3],3))+')*x+('+str(round(d[4],3))+')'

#Plotting
plt.figure(6)
#Subplot1
plt.subplot(2,2,1)
plt.grid(axis='both',which='major',color=[166/255,166/255,166/255], linestyle='-', linewidth=1)
plt.minorticks_on()
plt.grid(axis='both',which='minor',color=[166/255,166/255,166/255,0.4], linestyle=':', linewidth=0.5)
plt.scatter(x1,y1,150,color=[0/255,176/255,80/255,0.4],label='original data')
plt.plot(x2,ylin,'r:',label=ylineqn)
plt.xlabel('x')
plt.ylabel('y')
plt.legend(fontsize=4)

#Subplot2
plt.subplot(2,2,2)
plt.grid(axis='both',which='major',color=[166/255,166/255,166/255], linestyle='-', linewidth=1)
plt.minorticks_on()
plt.grid(axis='both',which='minor',color=[166/255,166/255,166/255,0.4], linestyle=':', linewidth=0.5)
plt.scatter(x1,y1,150,color=[0/255,176/255,80/255,0.4],label='original data')
plt.plot(x2,yquad,'r:',label=yquadeqn)
plt.xlabel('x')
plt.ylabel('y')
plt.legend(fontsize=4)

#Subplot3
plt.subplot(2,2,3)
plt.grid(axis='both',which='major',color=[166/255,166/255,166/255], linestyle='-', linewidth=1)
plt.minorticks_on()
plt.grid(axis='both',which='minor',color=[166/255,166/255,166/255,0.4], linestyle=':', linewidth=0.5)
plt.scatter(x1,y1,150,color=[0/255,176/255,80/255,0.4],label='original data')
plt.plot(x2,ycubic,'r:',label=ycubiceqn)
plt.xlabel('x')
plt.ylabel('y')
plt.legend(fontsize=4)

#Subplot4
plt.subplot(2,2,4)
plt.grid(axis='both',which='major',color=[166/255,166/255,166/255], linestyle='-', linewidth=1)
plt.minorticks_on()
plt.grid(axis='both',which='minor',color=[166/255,166/255,166/255,0.4], linestyle=':', linewidth=0.5)
plt.scatter(x1,y1,150,color=[0/255,176/255,80/255,0.4],label='original data')
plt.plot(x2,y4power,'r:',label=ypoly4eqn)
plt.xlabel('x')
plt.ylabel('y')
plt.legend(fontsize=4)

Using a DataFrame

It can also be quite useful to store the data and fitted data in a dataframe (table):

# Original Data
x1=np.arange(start=0,stop=21,step=2)
y1=0.1*x1**2+1.2*x1+5
# Original Data DataFrame
originaldata=pd.DataFrame(columns=['x1','y1'])
originaldata['x1']=x1
originaldata['y1']=y1

# Polyfit
a=np.polyfit(x1,y1,1)
b=np.polyfit(x1,y1,2)
c=np.polyfit(x1,y1,3)
d=np.polyfit(x1,y1,4)

# New x2 data
x2=np.arange(start=1,stop=21,step=2)

#Evaluating New ydata
ylin=np.polyval(a,x2)
yquad=np.polyval(b,x2)
ycubic=np.polyval(c,x2)
y4power=np.polyval(d,x2)

#New DataFrame
fitdata=pd.DataFrame(columns=['x2','ylin','yquad','ycubic','y4power'])
fitdata['x2']=x2
fitdata['ylin']=ylin
fitdata['yquad']=yquad
fitdata['ycubic']=ycubic
fitdata['y4power']=y4power

#Delete Old Variables
del x1
del y1
del a
del b
del c
del d
del x2
del ylin
del yquad
del ycubic
del y4power

Evaluating your Fit using statsmodels

So far nothing much is said about the goodness of fit of the polynomial. To get more information, we can install a third party library called statmodels. Close down Spyder and open the Anaconda Powershell Prompt. Type in:

conda install statmodels

Press y to proceed:

To use the stats models you will need to import the statsmodels formula api as smf (this is normally done at the top of your script):

import statsmodels.formula.api as smf

We need the Original Data DataFrame from above as well as the Polynomial Coefficients:

# Original Data
x1=np.arange(start=0,stop=21,step=2)
y1=0.1*x1**2+1.2*x1+5
# Original Data DataFrame
originaldata=pd.DataFrame(columns=['x1','y1'])
originaldata['x1']=x1
originaldata['y1']=y1

# Polyfit
a=np.polyfit(x1,y1,1)
b=np.polyfit(x1,y1,2)
c=np.polyfit(x1,y1,3)
d=np.polyfit(x1,y1,4)

Now we need to create a polynomial function from the coefficients measured using polyfit, in this case f and load this formula in smf.ols (stats models formula – ordinary least squares) specifying the output y1 being this function f of x1. We need to specify the data as a dataframe and then specify that we want to fit.

In the background this will use the original x1 data to fit the polynomial using the coefficients given from the 1st order polyfit of the x1, y1 data. It will then use least squares on the calculated y values comparing them with the ordinary original y values:

f=np.poly1d(a)
statistics1=smf.ols(formula='y1 ~ f(x1)', data=originaldata).fit()
print(statistics1.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                     y1   R-squared:                       0.970
Model:                            OLS   Adj. R-squared:                  0.967
Method:                 Least Squares   F-statistic:                     295.4
Date:                Wed, 07 Aug 2019   Prob (F-statistic):           3.44e-08
Time:                        21:54:52   Log-Likelihood:                -29.491
No. Observations:                  11   AIC:                             62.98
Df Residuals:                       9   BIC:                             63.78
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept  -1.776e-14      2.154  -8.25e-15      1.000      -4.873       4.873
model(x1)      1.0000      0.058     17.187      0.000       0.868       1.132
==============================================================================
Omnibus:                        1.593   Durbin-Watson:                   0.385
Prob(Omnibus):                  0.451   Jarque-Bera (JB):                1.107
Skew:                           0.566   Prob(JB):                        0.575
Kurtosis:                       1.936   Cond. No.                         67.8

Here the Adjusted R squared is 0.967 which is close to 1 indicating a good fit to the linear coefficients, however we can try the second order polynomial instead. Unsurprisingly the Adjusted R squared is now exactly 1.000 as this is a model polynomial with these exact coefficients.

f=np.poly1d(b)
statistics2=smf.ols(formula='y1 ~ f(x1)', data=originaldata).fit()
print(statistics2.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                     y1   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 8.684e+31
Date:                Wed, 07 Aug 2019   Prob (F-statistic):          9.61e-141
Time:                        22:08:36   Log-Likelihood:                 343.54
No. Observations:                  11   AIC:                            -683.1
Df Residuals:                       9   BIC:                            -682.3
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept  -3.553e-15   3.99e-15     -0.890      0.397   -1.26e-14    5.48e-15
f(x1)          1.0000   1.07e-16   9.32e+15      0.000       1.000       1.000
==============================================================================
Omnibus:                        2.870   Durbin-Watson:                   1.359
Prob(Omnibus):                  0.238   Jarque-Bera (JB):                0.466
Skew:                          -0.123   Prob(JB):                        0.792
Kurtosis:                       3.977   Cond. No.                         67.4
==============================================================================

Note over-fitting will also give an adjusted R squared of 1:

f=np.poly1d(c)
statistics3=smf.ols(formula='y1 ~ f(x1)', data=originaldata).fit()
print(statistics3.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                     y1   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 3.292e+31
Date:                Wed, 07 Aug 2019   Prob (F-statistic):          7.56e-139
Time:                        22:11:41   Log-Likelihood:                 338.20
No. Observations:                  11   AIC:                            -672.4
Df Residuals:                       9   BIC:                            -671.6
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   1.421e-14   6.48e-15      2.192      0.056   -4.52e-16    2.89e-14
f(x1)          1.0000   1.74e-16   5.74e+15      0.000       1.000       1.000
==============================================================================
Omnibus:                       15.376   Durbin-Watson:                   0.888
Prob(Omnibus):                  0.000   Jarque-Bera (JB):                9.005
Skew:                           1.784   Prob(JB):                       0.0111
Kurtosis:                       5.629   Cond. No.                         67.4

f=np.poly1d(d)
statistics4=smf.ols(formula='y1 ~ f(x1)', data=originaldata).fit()
print(statistics4.summary())

==============================================================================
Dep. Variable:                     y1   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 4.932e+31
Date:                Wed, 07 Aug 2019   Prob (F-statistic):          1.23e-139
Time:                        22:12:28   Log-Likelihood:                 340.43
No. Observations:                  11   AIC:                            -676.9
Df Residuals:                       9   BIC:                            -676.1
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   1.421e-14    5.3e-15      2.684      0.025    2.23e-15    2.62e-14
f(x1)          1.0000   1.42e-16   7.02e+15      0.000       1.000       1.000
==============================================================================
Omnibus:                        0.908   Durbin-Watson:                   0.593
Prob(Omnibus):                  0.635   Jarque-Bera (JB):                0.629
Skew:                          -0.057   Prob(JB):                        0.730
Kurtosis:                       1.834   Cond. No.                         67.4
==============================================================================

Python, NumPy and MatPlotLib: Curve Fitting Using PolyFit and PolyVal

Perquisites

Raw Data

Fitting a Polynomial using the Function polyfit

Evaluating a Polynomial at a DataPoint Using the Function polyval

Evaluating a Polynomial at a Series of DataPoints Using the Function polyval

Subplots of Multiple Fits

Using a DataFrame

Evaluating your Fit using statsmodels

Like this:

Leave a ReplyCancel reply

Perquisites

Raw Data

Fitting a Polynomial using the Function polyfit

Evaluating a Polynomial at a DataPoint Using the Function polyval

Evaluating a Polynomial at a Series of DataPoints Using the Function polyval

Subplots of Multiple Fits

Using a DataFrame

Evaluating your Fit using statsmodels

Share this:

Like this:

Leave a ReplyCancel reply