# Generating Perfect Model Data

Let’s generate some model x1 and y1 data. First we will create an independent variable x1 that is a column vector ranging from 0 to 20 in steps of 2 and then we can create y1 which is a polynomial with respect to x1. We will store this in a table called data:

x1=[0:2:20]'; y1=0.1*x1.^2+1.2*x1+5; data=table(x1,y1); clear x1; clear y1;

For this quadratic polynomial y1=0.1*x1.^2+1.2*x1+5 of the form y1=p2(1)*x1.^2+p2(2)*x1+p2(3) the three coefficients are:

p2(1)=0.1
p2(2)=1.2
p2(3)=5

p2(1) is the x.^2 coefficient p2(2) is the x coefficient and p2(3) is the constant.

# Scatter Plot

figure(1); scatter(data.x1,data.y1,100,'fill',... 'MarkerEdgeColor',[112/255,48/255,160/255],... 'MarkerFaceColor',[0/255,176/255,80/255],... 'LineWidth',1,...
'MarkerFaceAlpha',0.5,...
'MarkerEdgeAlpha',0.5)
set(gca,'FontSize',20)
xlabel('x')
ylabel('y')
title('Polynomial')
grid('minor')
legend('y1',... 'Location','NorthEast')

# Using polyfit to Find the Polynomial Coefficients

We can use polyfit to evaluate the polynomial coefficients. Polyfit has the form

[pn]=polyfit(x,y,n)

Where the input arguments are the x data, y data and n, the order of polynomial. The output arguments is a row vector pn of the polynomial coefficients. In most cases we would try fitting from a first order polynomial upwards however as we know our data is a second order polynomial, as it is created from a second order polynomial we can try fitting to a second order polynomial first.

[p2]=polyfit(data.x1,data.y1,2)

$\displaystyle \text{p2=}\left[ {\begin{array}{*{20}{c}} {0.100000} & {1.200000} & {5.000000} \end{array}} \right]$

These are the three polynomial coefficients for the quadratic equation:

$\displaystyle 0.100000{{x}^{2}}+1.200000x+5.000000$

The original polynomial was y1=0.1*x1.^2+1.2*x1+5 and the three coefficients returned match y1=p2(1)*x1.^2+p2(2)*x1+p2(3) as expected.

Which as expected is a perfect match to our original equation.

# Using polyval to Evaluate the Polynomial

We can use polyval to evaluate the polynomial at specified x values. Let’s first create a new variable x2 and assign it as a column in a new table called datafit.

x2=[1:2:19]'; datafit=table(x2); clear x2;

We can use polyval to evaluate the polynomial coefficients. polyval has the form

[newy]=polyval(pn,newx);

We can use the function

[datafit.y2]=polyval(p2,datafit.x2);

To find the value of the polynomial for each x2 value.

# Plotting the Fitted Data with Respect to the Raw Data

We can plot this data as a scatter plot alongside the original data:

figure(1); scatter(data.x1,data.y1,100,'fill',... 'MarkerEdgeColor',[112/255,48/255,160/255],... 'MarkerFaceColor',[0/255,176/255,80/255],... 'LineWidth',1,...
'MarkerFaceAlpha',0.5,...
'MarkerEdgeAlpha',0.5)
hold('on'); scatter(datafit.x2,datafit.y2,100,'fill',... 'MarkerEdgeColor',[255/255,0/255,0/255],... 'MarkerFaceColor',[0/255,112/255,92/255],... 'LineWidth',1,... 'MarkerFaceAlpha',0.5,... 'MarkerEdgeAlpha',0.5); hold('off'); set(gca,'FontSize',20)
xlabel('x')
ylabel('y')
title('Polynomial')
grid('minor')
legend('y1','y2',... 'Location','NorthEast')

It may be more common to plot the fitted curve as a line:

figure(1); scatter(data.x1,data.y1,100,'fill',... 'MarkerEdgeColor',[112/255,48/255,160/255],... 'MarkerFaceColor',[0/255,176/255,80/255],... 'LineWidth',1,...
'MarkerFaceAlpha',0.5,...
'MarkerEdgeAlpha',0.5)
hold('on'); plot(datafit.x2,datafit.y2,... 'color','r',... 'LineWidth',3); hold('off'); set(gca,'FontSize',20)
xlabel('x')
ylabel('y')
title('Polynomial')
grid('minor')
legend('y1','y2',... 'Location','NorthEast')

If we want to print the equation to the chart we can use:

equation=sprintf('y=(%f)x^2+(%f)x+(%f)',p2(1),p2(2),p2(3)); legend('data',equation,'location','northwest','FontSize',8);

The function sprintf has the same form as fprintf but prints to a string variable opposed to outputting in the Command Window.

# Underfitting – Fitting a 1st Order Polynomial to 2nd Order Polynomial Data

The code above can be modified to fit to a 1st order polynomial (i.e. a straight line). Here it is obvious by eye that it is an unsuitable fit:

The reader is recommended to try to use polyfit and polyval to try and fit the above curve to a 1st order polynomial (straight line).

x2=[1:2:19]'; [p1]=polyfit(data.x1,data.y1,1) datafit=table(x2); clear x2; [datafit.y2]=polyval(p1,datafit.x2); 

The user is recommended to plot the graph above, with the updated equation in the legend.

figure(1); scatter(data.x1,data.y1,100,'fill',... 'MarkerEdgeColor',[112/255,48/255,160/255],... 'MarkerFaceColor',[0/255,176/255,80/255],... 'LineWidth',1,...
'MarkerFaceAlpha',0.5,...
'MarkerEdgeAlpha',0.5)
hold('on'); plot(datafit.x2,datafit.y2,... 'color','r',... 'LineWidth',3); hold('off'); set(gca,'FontSize',20)
xlabel('x')
ylabel('y')
title('Polynomial')
grid('minor') equation=sprintf('y=(%f)x+(%f)',p2(1),p2(2)); legend('data',equation,'location','northwest','FontSize',8);

# Overfitting – Fitting a 3rd Order Polynomial to 2nd Order Polynomial Data

Here from the values of the coefficients i.e. the cubic term being extremely small 1.6509e-17 it is obvious that the polynomial is overfitted. This term can be rounded to 0 and we return with the quadratic fit. This is perfect model data so the fitting is relatively easy in real life it may be harder to distinguish data which has been over-fitted.

The reader is recommended to try to use polyfit and polyval to try and fit the above curve to a 3rd order polynomial.

x2=[1:2:19]'; [p3]=polyfit(data.x1,data.y1,3) datafit=table(x2); clear x2; [datafit.y2]=polyval(p3,datafit.x2); 

The user is recommended to plot the graph above, with the updated equation in the legend.

figure(1); scatter(data.x1,data.y1,100,'fill',... 'MarkerEdgeColor',[112/255,48/255,160/255],... 'MarkerFaceColor',[0/255,176/255,80/255],... 'LineWidth',1,...
'MarkerFaceAlpha',0.5,...
'MarkerEdgeAlpha',0.5)
hold('on'); plot(datafit.x2,datafit.y2,... 'color','r',... 'LineWidth',3); hold('off'); set(gca,'FontSize',20)
xlabel('x')
ylabel('y')
title('Polynomial')
grid('minor') equation=sprintf('y=(%f)x^3+(%f)x^2+(%f)x+(%f)',p3(1),p3(2),p3(3),p3(4)); legend('data',equation,'location','northwest','FontSize',8);

# Generating Model Data with Noise

rng('default'); x1=[0:2:20]'; [m,n]=size(x1); noise=rand(m,n); y1=0.1*x1.^2+1.2*x1+5+noise; data=table(x1,y1); clear x1; clear y1; clear noise; clear m; clear n;

We see a deviation from the starting polynomial with:

p2(1)=0.1
p2(2)=1.2
p2(3)=5

The deviation influences the constant term the most, followed by the x term.

If we times the noise by 5, we see a larger deviation:

If we times the noise by 10 we see more deviation:

And if we times the noise by 25 we see an even larger deviation again:

If we times the noise by 250, the original function is not recognisable: