Rogers Computational Chemistry Using the PC
.pdfCURVE FITTING |
|
|
73 |
||
SLOPE¼1.925152 , Y INTERCEPT¼2.100666 |
|||||
s |
|
p22 |
¼ |
: |
|
sb ¼ |
d11s2 |
0:194 |
|||
|
m ¼ p |
¼ |
|
|
|
|
|
d s2 |
|
0 078 |
|
TableCurve gives |
|
|
|
TableCurve Output 3-5 |
|
|
|
|
|
Rank 3 Eqn 1 y ¼ a þ bx |
|
|
|
|
|
r2 Coef Det |
DF Adj r2 |
Fit Std Err |
F-value |
|
|
0.9869616170 |
0.9832363647 |
0.2842291542 |
605.57301820 |
|
|
Parm Value |
Std Error |
t-value |
95% Confidence Limits |
|
|
a |
2.100666667 |
0.194165477 |
10.81895043 |
1.651357072 |
2.549976262 |
b |
1.925151515 |
0.078231500 |
24.60839325 |
1.744119520 |
2.106183510 |
The result of our analysis is that the data are fit by the straight line |
|
||||
y ¼ 2:10 0:19 þ 1:93 0:08 x |
|
|
ð3-36Þ |
Note that there are two equations with a higher rank than y ¼ a þ bx. They are the exponential and power equations y ¼ a þ b expð x=cÞ and y ¼ a þ bxc. There is little to choose among the r2 values of these three curve fits, 0.9871, 0.9871, and 0.9870, so we choose the simplest equation. Small differences in r2 values should not be counted too heavily, and we should be wary of r2 values that look impressive. Note that Fig. 3-2 shows substantial scatter despite the high r2 value. Use common sense and an aesthetic of simplicity in choosing the best curve fit. As usual, the number of significant figures in the final equation (3-36) is determined by the number of significant figures going into the calculation (Fig. 3-2).
COMPUTER PROJECT 3-1 j Linear Curve Fitting: KF Solvation
Linear extrapolation of the experimental behavior of a real gas to zero pressure or a solute to infinite dilution is often used as a technique to ‘‘get rid’’ of molecular or ionic interactions so as to study some property of the molecule or ion to which these interferences are considered extraneous. Emsley (1971) studied the heat (enthalpy) of solutions of potassium fluoride KF and the monosolvated species KF HOAc in glacial acetic acid at several concentrations. A known weight of the anhydrous salt KF was added to a known weight of glacial acetic acid in a Dewar flask fitted with a heating coil, a stirrer, and a sensitive thermometer. The temperature change on each addition was recorded. The heat capacity C of the flask and its contents was determined by supplying a known amount of electrical energy Q to the flask and noting the temperature rise T in kelvins (K)
Qð joulesÞ ¼ C T |
ð3-37Þ |
The experiment was repeated for the solvated salt KF HOAc, where the molecule of solvation is acetic acid, HOAc. Some experimental results calculated from the original paper are shown in Table 3-2:
74 |
COMPUTATIONAL CHEMISTRY USING THE PC |
|||
Table 3-2 Enthalpies of Solution sol’n H298 of KF and KF HOAc in Glacial |
||||
Acetic Acid at 298 K |
|
|
|
|
|
|
|
|
|
KF: C ¼ 4:168 kJ K 1 |
|
|
|
|
Molality |
0.194 |
0.590 |
0.821 |
1.208 |
Temperature change, K |
1.592 |
4.501 |
5.909 |
8.115 |
KF HOAc: C ¼ 4:203 kJ K 1 |
|
|
|
|
Molality |
0.280 |
0.504 |
0.910 |
1.190 |
Temperature Change, K |
0.227 |
0.432 |
0.866 |
1.189 |
Procedure. Calculate the heats of solution of the two species, KF and KF HOAc, at each of the four given molalities from a knowledge of the heat capacity. Calculate the enthalpy of solution per mole of solute sol’nH298 at each concentration. Find the least squares curve fit and calculate the uncertainties for the function
sol’n H298 ¼ sol’n H0298 sol’nH0298 þðSLOPEÞM SLOPE |
ð3-38Þ |
where M is the molality and sol’n H0298 is the enthalpy of solution at infinite dilution. Do this for each species, the anhydrous and the solvated fluoride. Record the slopes and intercepts of both functions, SLOPE1 for KF and SLOPE2 for
KF HOAc. Record the enthalpy changes and uncertainties, sol’n H0298 sol’nH0298 (KF) and sol’n H0298 sol’nH0298 (KF HOAc). What are the units of SLOPE?
Read the article on the original research (Emsley, 1971) and include a commentary on these results in your report for this experiment. Emsley claims that the enthalpy of solution
KFðsÞ þ HOAcðlÞ ! KF OAcHðsol’nÞ
is sol’n H0298 ¼ 38:5 kJ mol 1. What argument does he present to support this claim?
COMPUTER PROJECT 3-2 j The Boltzmann Constant
An interesting historical application of the Boltzmann equation involves examination of the number density of very small spherical globules of latex suspended in water. The particles are distributed in the potential gradient of the gravitational field. If an arbitrary point in the suspension is selected, the number of particles N at height h mm (1 mm ¼ 10 6 m) above the reference point can be counted with a magnifying lens. In one series of measurements, the number of particles per unit volume of the suspension as a function of h was as shown in Table 3-3.
The Boltzmann distribution gives
mgh |
ð3-39Þ |
N ¼ N0e kT |
CURVE FITTING |
|
|
|
|
|
75 |
|
Table 3-3 The Number of Latex Particles at Various Heights in mm Above a |
|
||||||
|
Reference Point |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
h=mm |
0 |
50 |
70 |
90 |
100 |
150 |
200 |
N |
977 |
453 |
293 |
219 |
176 |
69 |
28 |
|
|
|
|
|
|
|
|
at constant temperature T, where m is the effective mass of the particle corrected for the buoyancy of the supporting medium.
Statistical analysis of the data set is best done by ‘‘linearizing’’ the function (Jurs, 1996), that is, by transforming it to a straight line of the form y ¼ a þ bx. In the case of the Boltzmann distribution, because y ¼ aebx leads to ln y ¼ ln a þ bx, we can take logarithms of both sides,
ln N ¼ ln N0 |
mgh |
ð3-40Þ |
kT |
whereupon the slope of ln N vs. h is mghkT , where g is the acceleration due to the gravitational field and k is a universal constant now called Boltzmann’s constant and denoted kB.
The supporting medium was water at 298 K ðr ¼ 0:99727Þ, and the density of latex is 1.2049 g cm 3. The latex particles had an average radius of 2:12 10 4 mm; hence, their effective mass corrected for buoyancy is their volume 43p r3 times the density difference r between latex and the supporting medium, water
m ¼ nr ¼ ð4=3Þð2:12 10 7 mÞ3ð1:2049 0:99727Þ
This yields m ¼ 8:287 10 18 kg.
Procedure. Compute the slope of the function by a linear least squares procedure and obtain a value of Boltzmann’s constant. How many particles do you expect to find 125 mm above the reference point? Take the uncertainty you have calculated for the slope, as the uncertainty in kB. Is the modern value of kB ¼ 1:381 10 23 within these error limits?
Having determined kB and knowing that the gas constant R ¼ 8:314 J K 1 from macroscopic measurements on gases, determine Avogadro’s number L from the relationship
R ¼ kBL |
ð3-41Þ |
Calculate the % difference between L found by this method and the modern value of 6:022 1023. Does this support the idea that the Boltzmann constant is the gas constant per particle?
76 |
COMPUTATIONAL CHEMISTRY USING THE PC |
|
COMPUTER PROJECT 3-3 |
j |
The Ionization Energy of Hydrogen |
The ionization energy for hydrogen is the minimum amount of energy that is required to bring about the reaction
H ! Hþ þ e
The ionization energy for hydrogen (or other hydrogen-like systems) can be found from the Rydberg equation
|
1 |
¼ |
|
1 |
1 |
|
|
|
n ¼ l |
|
n12 |
n22 |
ð |
|
Þ |
||
|
|
|
R |
|
|
|
3-42 |
|
along with an accurate set of spectral data. Eq. (3-42) leads us to believe that both the slope R and the intercept R of the linear function
|
¼ |
R |
|
n22 |
|
ð |
3-43 |
Þ |
|
|
R 1 |
|
|
|
|||
n |
|
|
|
|
|
|
|
|
are equal to the ionization energy which has n22 ¼ 1.
Procedure. Use Mathcad, QLLSQ, or TableCurve (or, preferably, all three) to determine a value of the ionization energy of hydrogen from the wave numbers in Table 3-4 taken from spectroscopic studies of the Lyman series of the hydrogen spectrum where n1 ¼ 1.
Table 3-4 Spectral Wavenumbers v for the Lyman Series of Hydrogen
2 |
¼ |
2 |
3 |
4 |
5 |
6 |
7 |
n2 |
|
||||||
n2 |
¼ 4 |
9 |
16 |
25 |
36 |
49 |
|
v ¼ 82259 |
97492 |
102824 |
105292 |
106632 |
107440 |
Note that we are interested in n2, the atomic quantum number of the level to which the electron ‘‘jumps’’ in a spectroscopic excitation. Use the results of this data treatment to obtain a value of the Rydberg constant R. Compare the value you obtain with an accepted value. Quote the source of the accepted value you use for comparison in your report. What are the units of R? A conversion factor may be necessary to obtain unit consistency. Express your value for the ionization energy of H in units of hartrees (h), electron volts (eV), and kJ mol 1. We will need it later.
Reliability of Fitted Polynomial Parameters
The method of finding uncertainty limits for linear equations can be generalized to higher-order polynomials. The matrix method for finding the minimization
CURVE FITTING |
77 |
parameters for the polynomial of next higher order (second order), y ¼ ax þ bx2 is
0 |
a |
P xi |
P |
B Pxi2 |
P xi3 |
P |
|
|
xi |
xi2 |
P |
@ P |
P |
xi3 |
10 b 1 |
¼ |
0 Pxiyi |
1 |
ð3-44Þ |
xi2 |
a |
|
yi |
C |
|
xi4 |
C c |
|
B Pxi2yi |
|
|
|
A@ A |
|
@ P |
A |
|
with the solution vector |
Pxi2 |
P xi3 |
1 |
|
0 Pxiyi |
1 |
ð3-45Þ |
|||
0 b 1 |
¼ |
0 |
xi |
1 |
||||||
a |
|
|
a |
xi |
xi2 |
|
yi |
C |
|
|
c |
|
B Pxi2 |
P xi3 |
P xi4 |
C B Pxi2yi |
|
||||
@ A |
|
@ P |
P |
P |
A @ P |
A |
|
The quadratic curve fit leads to a number of residuals equal to the number of points in the data set. The sum of squares of residuals gives SSE by Eqs. (3-23) and MSE by Eq. (3-30), except that now the number of degrees of freedom for n points is n 3
SSE |
ð3-46Þ |
MSE ¼ n 3 |
p
MSE gives the standard deviation of the regression, s ¼ MSE.
The uncertainties of the minimization parameters are calculated just as they were for the linear case except that now there are three of them
sa ¼ |
p 22 |
|
3-47b |
|
|
d11s2 |
ð3-47aÞ |
||||
|
b ¼ p |
ð |
|
Þ |
|
s |
|
d s2 |
|
|
|
and |
c ¼ p |
|
|
|
|
|
ð |
|
Þ |
||
s |
|
d33s2 |
|
3-47c |
|
In contrast to the linear case, there are three degrees of freedom, but there is still only one standard deviation of the regression, s. The reader has the opportunity to try out these ideas in Computer Project 3-4.
COMPUTER PROJECT 3-4 j The Partial Molal Volume of ZnCl2
In general, the volume of a solution, say ZnCl2 in water, is dependent on the number of moles ni of each of the components. For a binary solution,
V ¼ f ðn1; n2Þ |
ð3-48Þ |
The change in volume dV on adding a small amount dn1 of water or dn2 of ZnCl2 is
dV ¼ |
qn1 dn1 |
þ |
qn2 dn2 |
ð3-49Þ |
||
|
|
qV |
|
|
qV |
|
78 |
COMPUTATIONAL CHEMISTRY USING THE PC |
where we stipulate that P and T are constant for the process and we adopt the usual subscript convention, 1 for solvent and 2 for solute. If we specify 1 kg as the amount of water, n2 is the molality of ZnCl2. We expect that the volume of the solution will be greater than 1000 cm3 by the volume taken up by the ZnCl2. It may seem reasonable to take the volume of one mole of ZnCl2 in the solid state Vm and add it to 1000 cm3 to get the volume of a 1 molal solution. One-half the molar volume of solute would, by this scheme, lead to the volume of a 0.5 molar solution and so on. This does not work. The volume of 1000 g of water in the solution is not exactly 1000 cm3, and it is dependent on the temperature. Nor are the volumes additive. Indeed, some solutes cause contraction of the solution to less than 1000 cm3.
Interactions at the molecular or ionic level cause an expansion or contraction of the solution so that, in general
V 6¼1000 þ Vm
We define a partial molar volume Vi such that
|
|
V ¼ n1V1 |
þ n2V2 |
for a binary solution or, in general,
|
N |
|
X |
V ¼ |
|
niVi |
|
|
i¼1 |
for a solution of N components.
It can be shown (Alberty, 1987) that
¼ qV Vi
qni nj
ð3-50Þ
ð3-51Þ
ð3-52Þ
ð3-53Þ
where the subscript nj indicates that all components in the solution other than i are held constant. If the solution is a binary solution of n2 moles of solute in 1 kg of
water, V2 is the partial molal volume of component 2. A partial molal volume is a special case of the partial molar volume for 1 kg of solvent.
Procedure. A study on the partial molal volume of ZnCl2 solutions gave the following data (Alberty, 1987)
g ZnCl2 per kg of H2O |
20:00 |
60:00 |
100:00 |
140:00 |
180:00 |
200:00 |
density |
1:0167 |
1:0532 |
1:0891 |
1:1275 |
1:1665 |
1:1866 |
Calculate the number of moles of ZnCl2 per kilogram of water in each solution (the molality m). Calculate the volume V of solution containing 1 kg of water at each solute concentration. Plot V vs. m. Use program Mathcad, QQLSQ, or TableCurve
CURVE FITTING |
79 |
(or all three, if available) to obtain a quadratic expression V ¼ a þ bm þ cm2. Obtain an expression for the slope dV/dm which is the same as ðqV=qn2Þn1 . This is the partial molal volume of ZnCl2. It is a partial volume because V varies with both nZnCl2 and the number of moles of water n1. What is the partial molal volume of ZnCl2 in water at 1.00 molal concentration? What is the partial molar volume of water at this concentration?
PROBLEMS
1.Select several values for m of the data set in Exercise 3-1 and calculate ðxi mÞ for each of them. Plot the curve of P ðxi mÞ2 as a function of the selected parameter m and locate the minimum visually. Compare with Solution 3-1.
2.Obtain Eq. (3-8) from Eq. (3-6) through Eq. (3-7).
3.An excess of porphobilinogen in the urine is associated with hepatic disorders and lead poisoning. Porphobilinogen can be separated from other porphyrins by ion exchange chromatography and treated with p-dimethylaminobenzalde- hyde (PDMA) to produce a red compound that absorbs light strongly at 550 nm. A set of standard solutions was made up with concentrations of 50.0, 75.0, 100.0, 125.0, 150.0, 175.0, 200.0, 225.0, and 250.0 mg/100 mL of porphobilinogen. Their absorbances A after treatment with PDMA were 0.039, 0.061, 0.087, 0.107, 0.119, 0.163, 0.179, 0.194, and 0.213. What is the spectrophotometric calibration curve A ¼ f (concentration) for this method? What are the units of slope? Three urine specimens treated by this method yielded absorbances A of 0.180, 0.162, and 0.213. What were the porphobilinogen concentrations of these three samples?
4.Expand the three determinants D, Db, and Dm for the least squares fit to a linear function not passing through the origin so as to obtain explicit algebraic expressions for b and m, the y-intercept and the slope of the best straight line representing the experimental data.
5.Set up the determinants D, Da, Db, and Dc for the least squares fit to a parabolic curve not passing through the origin.
6.Expand the four 3 3 determinants obtained in Problem 5.
7.Using the expanded determinants from Problem 6, write explicit algebraic expressions for the three minimization parameters a, b, and c for a parabolic curve fit.
8.Compare the solution for Problems 6 and 7 with the BASIC statements in Program QQLSQ.BAS or QLLSQ.tru. They should agree.
9.Emsley, in the same paper referred to in Computer Project 3-1, presents viscosity measurements Z for solutions of KF in HOAc as a function of molality m with the following results
m |
0:135 |
0:398 |
1:028 |
1:466 |
1:903 |
2:567 |
3:052 |
3:428 |
3:770 |
4:206 |
4:307 |
Z |
1:38 |
1:77 |
3:69 |
5:97 |
8:68 |
16:34 |
29:29 |
41:53 |
58:57 |
92:73 |
106:36 |
80 |
COMPUTATIONAL CHEMISTRY USING THE PC |
Carry out a statistical analysis of this data set including a fitting equation and all uncertainties.
10.Write the determinant for a sixth-degree curve fitting procedure.
11.The volume of ZnCl2 solutions containing 1000 g of water varies according to the quadratic equation (Computer Project 3-4)
V ¼ 999:71 þ 21:148 m þ 4:471 m2
Find the partial molal volume of ZnCl2 in these solutions at 0.5, 1.0, 1.5 and 2.0 molar concentrations.
Multivariate Least Squares Analysis
In multivariate least squares analysis, the dependent variable is a function of two or more independent variables. Because matrices are so conveniently handled by computer and because the mathematical formalism is simpler, multivariate analysis will be developed as a topic in matrix algebra rather than conventional algebra.
We have already seen the normal equations in matrix form. In the multivariate case, there are as many slope parameters as there are independent variables and there is one intercept. The simplest multivariate problem is that in which there are only two independent variables and the intercept is zero
y ¼ m1x1 þ m2x2 |
ð3-54Þ |
To simplify the algebra, the error in the x variable will be considered negligible relative to the error in the y variable, although this is not a necessary condition. Equation (3-54) describes a plane passing through the origin. Let us restrict it to positive values of x1, x2, and y.
One measurement of the dependent variable yields y1 for known values of the independent variables x11; x12
y1 |
¼ m1x11 |
þ m2x12 |
ð3-55aÞ |
and a second yields |
|
||
y2 |
¼ m1x21 |
þ m2x22 |
ð3-55bÞ |
for new values of the independent variables x21; x22.
The mathematical requirements for unique determination of the two slopes m1 and m2 are satisfied by these two measurements, provided that the second equation is not a linear combination of the first. In practice, however, because of experimental error, this is a minimum requirement and may be expected to yield the least reliable solution set for the system, just as establishing the slope of a straight line through the origin by one experimental point may be expected to yield the least reliable slope, inferior in this respect to the slope obtained from 2, 3, or p experimental points. In univariate problems, accepted practice dictates that we
CURVE FITTING |
81 |
obtain many experimental points and determine the ‘‘best’’ slope representing them by a suitable univariate regression procedure.
The analogous procedure for a multivariate problem is to obtain many experimental equations like Eqs. (3-55) and to extract the best slopes from them by regression. Optimal solution for n unknowns requires that the slope vector be obtained from p equations, where p is larger than n, preferably much larger. When there are more than the minimum number of equations from which the slope vector is to be extracted, we say that the equation set is an overdetermined set. Clearly, n equations can be selected from among the p available equations, but this is precisely what we do not wish to do because we must subjectively discard some of the experimental data that may have been gained at considerable expense in time and money.
Equation set (3-55) can be written in matrix form
y ¼ Xm |
ð3-56Þ |
where X is the matrix of (known) input variables
X ¼ |
x11 |
x12 |
ð3-57Þ |
x21 |
x22 |
y is the nonhomogeneous vector of dependent experimental measurements, and m is the slope vector, that is, the solution vector of the regression problem.
The deviations or residuals of y1 and y2 from the regression plane passing through the origin are
d1 |
¼ m1x11 |
þ m2x12 |
y1 |
|
|
|
|
ð3-58aÞ |
|
d2 |
¼ m1x21 |
þ m2x22 |
y2 |
|
|
|
|
ð3-58bÞ |
|
which are minimized by setting |
q |
|
P di2 and |
q |
|
P di2 equal to zero as shown in the |
|||
qm |
1 |
qm |
2 |
section on linear functions not passing through the origin. These minimizations yield
m1x112 þ m2x11x12 þ m1x212 þ m2x21x22 ¼ y1x11 |
þ y2x21 |
ð3-59aÞ |
m1x11x12 þ m2x122 þ m1x21x22 þ m2x222 ¼ y1x12 |
þ y2x22 |
ð3-59bÞ |
We are dealing with real numbers that commute; hence, it is evident that the right side of Equation set (3-59) is
x11 |
x21 |
y1 |
ð3-60Þ |
x12 |
x22 |
y2 |
|
or |
|
|
|
XTy |
|
|
ð3-61Þ |
where the matrix XT is the transpose of the input matrix X.
82 |
COMPUTATIONAL CHEMISTRY USING THE PC |
The left side of the normal equations can be seen to be a product including X, its transpose, and m. Matrix multiplication shows that
XTXm |
ð3-62Þ |
is the matrix representation of the left side of the normal equations (see Problems). Two important facts emerge here. First, the method is general and can be worked out for the n n case ðn > 2Þ as above but with added labor. Second, the input matrix need not be square. By the geometric nature of a rectangular matrix, it is always conformable for multiplication into its own inverse. The result is a square product matrix of the smaller of the two dimensions of the rectangular input matrix. Indeed, for the present treatment to be nontrivial, the input matrix must be rectangular; a square input matrix with X 1 defined (X nonsingular) represents the problem of n independent equations in n unknowns, which is the problem we said we do not want to solve. From this point on, envision X as a p n matrix with
p > n.
The normal equations are simultaneous equations
XTXm ¼ XTy |
ð3-63Þ |
in which XTy yields a vector of the smaller dimension of XT, the same dimension as the vector obtained as the product XTXm. The normal equations are often written
XTX m ¼ q |
ð3-64Þ |
|
|
|
y. |
where |
XTX emerges as the coefficient matrix of a simple set of simultaneous |
equations, m is the solution vector, and q is the nonhomogeneous vector XT Solution follows by the usual method of inverting the coefficient matrix and premultiplying it into both sides
XTX 1 XTX m ¼ XTX 1q |
|
or |
|
m ¼ XTX 1q |
ð3-65Þ |
This equation permits us to generate an n-fold slope vector m from a rectangular matrix X and a dependent variable vector y. The procedure is analogous to the univariate case of generating the slope of a calibration curve passing through the origin from p known values of xi and the corresponding measured values y1; . . . ; yi; . . . ; yp, with the intention of using m to determine unknown values of x (the concentration of an analyte perhaps) from future measurements of y.