Wednesday, August 11, 2021

General Linear Least-Squares

Problem 15.6.


Use multiple linear regression to derive a predictive equation for dissolved oxygen concentration as a function of temperature and chloride based on the data from Table P15.5. Use the equation to estimate the concentration of dissolved oxygen for a chloride concentration of 15 g/L at T = 12 °C. Note that the true value is 9.09 mg/L. Compute the percent relative error for your prediction. Explain possible causes for the discrepancy. 
Table P15.5 Dissolved oxygen concentration in water as a function of temperature (°C) and chloride concentration (g/L)

Solution:


The multiple linear regression to evaluate is 
*y is dissolved oxygen concentration (mg/L).

The [Z] and y matrices can be set up using MATLAB commands:
>> %enter the data T, c & y to be fit
>> T = [0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30]';
>> c = [0 0 0 0 0 0 0 10 10 10 10 10 10 10 20 20 20 20 20 20 20]';
>> y = [14.6 12.8 11.3 10.1 9.09 8.26 7.56 12.9 11.3 10.1 9.03 8.17 7.46 6.85 11.4 10.3 8.96 8.08 7.35 6.73 6.20]';
>> %create the [Z] matrix
>> Z = [ones(size(T)) T c];
>> %solve for the coefficients of the least-squares fit
>> a = (Z'*Z)\(Z'*y)

a =

  13.522142857142857
  -0.201238095238095
  -0.104928571428572

Thus, the best fit multiple regression model is




We can evaluate the prediction at T = 12°C and c = 15 g/L and evaluate the percent relative error as
>> yp = a(1)+a(2)*12+a(3)*15

yp =

   9.533357142857142

>> ea = abs((9.09-yp)/9.09)*100

ea =

   4.877416313059866

Thus, the error is considerable. This can be seen even better by generating predictions for all the data and then generating a plot of the predictions versus the measured values. A one-to-one line is included to show how the predictions diverge from a perfect fit.

>> yp = a(1)+a(2).*T+a(3).*c;
>> ymin = min(y);ymax = max(y);
>> dy = (ymax - ymin)/100;
>> ymod = [ymin:dy:ymax];
>> plot(y,yp,'ko',ymod,ymod,'k-')
>> axis square,title('Plot of predicted vs measured values of dissolved oxygen concentration')
>> legend('model prediction','1:1 line','Location','Northwest')
>> xlabel('y measured'),ylabel('y predicted')
The cause for the discrepancy is because the dependence of oxygen concentration on the unknowns is significantly nonlinear. It should be noted that this is particularly the case for the dependency on temperature.

No comments:

Post a Comment

Numerical Integration Formulas

Here are the M-files to implement composite trapezoidal rule for equally spaced data and unequally spaced data.  Composite Trapezoidal Rule ...