Statistics can be made to prove anything -- even the truth if necessary.

#  REGRESSION ANALYSIS BASICS The Right Way To Do It Different regression techniques give different results for the regression equation. Simple or Linear regression is the most common form used in petrophysical analysis, giving an equation of the form Y = A * X + B.

Multiple regression related the dependent variable Y to a number of independent variables, for example Y = A1 * X1 + A2 * X2 ... +B.

Non linear or polynomial regression provides relationships that involve powers, roots, or other non-linear functions, such as logarithms or exponentials.

Excel and Lotus 1-2-3 offer some simple linear and non-linear regression models, but more sophisticated software is required for multiple regression. A good freeware package is Statcato (www.statcato.org). It is a java based program. My copy is HERE. The "Y-on-X" line is the one that will result from use of spreadsheet software. Y is the dependent axis (predicted variable) and X is the independent axis (the variable doing the predicting). The line minimized the errors in the vertical direction (Y axis) using a least-squares solution.

The "X-on-Y line reverses the roles of the two axes, minimizing the error in the horizontal direction (as the graph is drawn here)..

The RMA line, the reduced major axis, assumes that neither axis depends on the other and is very nearly halfway between the first two lines. It minimizes the error at right angles to the line. The ER, or error ratio line, minimizes the error on both X and Y directions. There is not usually much difference between the RMA and ER lines. All four lines intersect at the centroid of the data. SIMPLE LINEAR REGRESSION and BASIC Statistical
The equations used are as follows: Slope of Best Fit Line
1: A1 = (Sum (XiYi) - Sum (Xi) * Sum (Yi) / Ns) / (Sum (Xi ^ 2) - Sum (Xi) ^ 2) / Ns)
2: A2 = (Sum (XiYi) - Sum (Yi) * Sum (Xi) / Ns) / (Sum (Yi ^ 2) - Sum (Yi) ^ 2) / Ns) Intercept on Y Axis
3: B1 = (Sum (Yi) - Al * Sum (Xi)) / Ns
4: B2 = (Sum (Xi) - A2 * Sum (Yi)) / Ns Equation of Best Fit Lines
5: Y = A1 * X + B1  (Y is dependent axis)
6: X = A2 * Y + B2 (X is dependent axis)

The Reduced Major Axis regression line is the regression line that usually represents the most useful relationship between the X and Y axes. It assumes that both axes are equally error prone. An approximation to this line is halfway between the two independent regression lines. Solve equation 6 for Y:
7: Y = (1/A2) * X + B2 / A2

Average slope and intercept of equations 5 and 7:
8: A3 = (A1 + 1/A2) / 2
9: B3 = (B1 + B2 / A2) / 2
10: Y = A3 * X + B3 (reduced major axis)

Coefficient of Determination
11: Cd = (B1 * Sum (iY) + Al * Sum (Xi * Yi) - (Sum (Yi) ^ 2) / Ns) /
(Sum (Xi ^ 2) - (Sum (Xi) ^ 2) / Ns)

The coefficient of determination is a measure of "best fit" and is capable of being calculated as data is entered and processed (e.g.: as in a hand calculator). Other measures of fit require two passes through the data - the first to find the average X and average Y values, then a second pass to find the differences between each individual X and the average X, and the differences between the individual Y and the average Y values.

An alternate form of the above equation is:
12: Cd = (Sum (XiYi) - Sum (Xi) * Sum (Yi) / Ns) / (((Sum (Xi ^ 2) - Sum (Xi) ^ 2) / Ns) *
(Sum (Yi ^ 2) - Sum (Yi) ^ 2) / Ns)) ^ 0.5

Both equations give the same answer.

These data are used in the following statistical measures. Arithmetic Mean
13: Xbar = Sum (Xi) / Ns
14: Ybar = Sum (Yi) / Ns

Variance
15: Vx = Sum ((Xi - Xbar) ^ 2) / (Ns - 1)
16: Vy = Sum ((Yi - Ybar) ^ 2) / (Ns - 1)

Standard Deviation
17: Sx = Vx ^ 0.5
18: Sy = Vy ^ 0.5

Correlation Coefficient
19: Rxy = A1 * Sx / Sy

T Ratio
20: Txy = Rxy * ((Ns - 2) / (1 - (Rxy ^ 2))) ^ 0.5

Skew
21: Ux = (Sum ((Xi - Xbar) ^ 3) / Ns) / ((Sum ((Xi - Xbar) ^ 2) / Ns) ^ 1.5)
22: Uy = (Sum ((Yi - Ybar) ^ 3) / Ns) / ((Sum ((Yi - Ybar) ^ 2) / Ns) ^ 1.5)

Kurtosis
23: Kx = (Sum ((Xi - Xbar) ^ 4) / Ns) / ((Sum ((Xi - Xbar) ^ 2) / Ns) ^ 2)
24: Ky = (Sum ((Yi -Ybar) ^ 4) / Ns) / ((Sum ((Yi - Ybar) ^ 2) / Ns) ^ 2)

Geometric Mean
25: Gx = (PROD (Xi)) ^ (1 / Ns)
26: Gy = (PROD (Yi)) ^ (1 / Ns)

Harmonic Mean
27: Hx = Ns / (Sum (1 / Xi))
28: Hy = Ns / (Sum (1 / Yi))

WHERE:
A1 = slope of best fit line (x dependent)
A2 = slope of best fit line (y dependent)
A3 = slope of best fit line (reduced major axis)
B1 = intercept of best fit line (x dependent)
B2 = intercept of best fit line (y dependent)
B3 = intercept of best fit line (reduced major axis)
Cd = coefficient of determinations
Gx = geometric mean of X values
Gy = geometric mean of Y values
Hx = harmonic mean of X values
Hy = harmonic mean of Y values
Kx = kurtosis of X values
Ky = kurtosis of Y values
Ns = number of X - Y pairs or number of samples
Rxy = correlation coefficient
Sx = standard deviation of X values
Sy = standard deviation of Y values
Txy = T ratio
Ux = skew of X values
Uy = skew of Y values
Vx = variance of X values
Vy = variance of Y values
Xi = individual X data values
Xbar = arithmetic mean of X values
XiYi = product of individual X - Y pairs
Yi = individual Y data values
Ybar = arithmetic mean of Y values MULTIPLE  LINEAR REGRESSION

The model for a multiple regression takes the form:
30:
Y = b0 + b1X1 + b2X2 + b3X3 + .....

The b's are termed the "regression coefficients".  Instead of fitting a line to data, we are now fitting a plane (for 2 independent variables), a space (for 3 independent variables).

The estimation can still be done according the principles of linear least squares. The algebraic formulae for the solution (i.e. finding all the b's) are UGLY. However, the matrix solution is elegant:

The matrix model is:
31:  [Y] = [X] * [B]

The solution is:
32: [B] = ([X'] * [X])-1 * [X'] * [Y] 