Data Analysis and Database Services for Industry & Education Brighton Webs Ltd.
Data Analysis and Database Services for Industry & Education

Home
Index
Feedback

Linear Regression

Linear regression is one of the basic tools of modelling.  It creates a linear relationship between an independent variable (i.e. x) and a dependent one (i.e. y) in the form:

Linear Regression

The assumption is that the error (i.e. e):

Error due to regression

is normally distributed.  These two formula are illustrated on the graphic below:

Linear regression - relationship between X and Y

Related Topics

Linear Correlation

A.KA. (Also known As)

The method is also referred to as "least squares", "line of best fit" and "curve fitting".

Calculation of A and B

The formulae for for calculating A and B are often referred to as the "Normal Equations", they are set out in many ways, the sum of squares notation used below, can be adapted for computer programs and manual computation:

Step 1 - Calculate Means

Linear regression - calculation of mean

Step 2 - Calculation of Sum of Squares

Linear regression - Calculation of Sum of Squares

Step 3 - Calculation of A and B

Linear regression - equation

Example

The price of independently branded, non-organic flour in a supermarket is related to its protein content.  In the page on correlation, it was established that the relationship between price and protein content was significant, this example attempts to quantify that relationship.  The data is shown in tabular form below:

x - Protein Content y - Price
(gm/100gm) (GBP/kg)
10.4 0.71
10.3 0.73
11.5 0.83
12.8 0.81
13.2 0.86
13.9 0.97
mean=12.0 mean=0.82

The sum of squares values are:

SXX 11.388
SYY 0.045
SXY 0.655

The final step is the calculation of the gradient (b) and the intercept (a)

b 0.655/11.388=0.058
a 0.82-0.058*12.0=0.127

The graph shows the data points in relationship to the regression parameters:

Psuedo Code

The psuedo code is in the style of MS Visual Basic.  It is assumed that N pairs of  X and Y values are stored in arrays (base index zero)

''***************************************************************
'*  N                                     Number of X and Y Pairs
'*  X()                                   Array of X values
'*  Y()                                   Array of Y values
'*
'*  SXX, SYY and SXY      Sum of Squares
'*  I                                       Loop Counter
'*
'*  A and B                          Intercept and Slope
'*
'***************************************************************

'Calculation of Mean

For I=0 to N-1
  X_Bar=X_Bar+X(I)
  Y_Bar=Y_Bar+Y(I)
Next I

X_Bar=X_Bar/N
Y_Bar=Y_Bar/N

'Sum of Squares

For I=0 to N-1
  SXX=SXX+X(I)^2
  SYY=SYY+Y(I)^2
  SXY=SXY+X(I)*Y(I)
Next I

'Slope and Intercept

B=SXY/SXX
A=Y_Bar-B*X_Bar

Page modified: 04-Mar-2008

 

For more information: info@brighton-webs.co.uk