*Create a regression object: new()*

`
``
use Statistics::OLS;
my $ls = Statistics::OLS->new;
`

*Register a dataset: setData()*

`
``
$ls->setData (\@xydata);
$ls->setData (\@xdata, \@ydata);
`

The *setData()* method registers a two-dimensional dataset with the
regression object. You can pass the dataset either as a reference to
one flat array containing the paired x,y data or as references to two
arrays, one each for the x and y data. [In either case, the data
arrays in your script are not cached (copied into the object). If you
alter your data, you may optionally call *setData()* again (if you want
error checking—see below) but you should at least call the *regress()*
method (see below) to recompute statistics for the new data. Or more
simply, do not alter your data.]

As a single array, in your script, construct a flat array of the form
(x0, y0, ..., xn, yn) containing n+1 x,y data points. Then pass a
reference to the data array to the *setData()* method. (If you do not
know what a reference is, just put a backslash (\) in front of the
name of your data array when you pass it as an argument to *setData()*.)
Like this:

`
`

`
my @xydata = qw( -3 9 -2 4 -1 1 0 0 1 1 2 4 3 9);
$ls->setData (\@xydata);
`

Or, you may find it more convenient to construct two equal length
arrays, one for the horizontal and one for the corresponding vertical
data. Then pass references to both arrays (horizontal first) to
*setData()*:

`
`

`
my @xdata = qw( -3 -2 -1 0 1 2 3 );
my @ydata = qw( 9 4 1 0 1 4 9 );
$ls->setData (\@xdata, \@ydata);
`

<B>Error checking:B> The *setData()* method returns a postive integer on
success and 0 on failure. If *setData()* fails, you can recover an error
message about the failure with the *error()* method. The
error string returned will either be one of

`
`

`
The data set does not contain an equal number of x and y values.
The data element ... is non-numeric.
The data set must contain at least three points.
`

In your script, you could test for errors like:

`
`

`
$ls->setData (\@data) or die( $ls->error() );
`

In the current version, only numerals, decimal points (apologies to
Europeans), scientific notation (1.7E10) and minus signs are
permitted. Currencies ($), time (11:23am) or dates (23/05/98) are not
yet supported and will generate errors. I may figure these out
someday.

`
``
$ls->regress() or die ( $ls->error() );
`

This performs most of the calculations. Call this method after setting
the data, but before asking for any regressions results. If you change
your data, previous calculations will generallly be inaccurate, so you
should call this method again. The *regress()* method returns 1 on
success, The only error message is

`
`

`
No datset has been registered.
`

although a number of undef results (due to divide by zero errors) may
be returned in specific statistics below.

`
``
my ($intercept, $slope) = $ls->coefficients();
my $R_squared = $ls->rsq();
my ($tstat_intercept, $tstat_slope) = $ls->tstats();
my $sigma = $ls->sigma();
my $durbin_watson = $ls->dw();
my $sample_size = $ls->size();
my ($avX, $avY) = $ls->av();
my ($varX, $varY, $covXY) = $ls->var();
my ($xmin, $xmax, $ymin, $ymax) = $ls->minMax();
`

Call these methods only after you have called *regress()*. Most of
these should be familiar from any econometrics text. If the slope is
infinite (variance of X is zero) it is set to undef. R-squared is 1.0
if the sample variances of either X or Y are zero (or the data are
colinear). If the variance of X is zero, both T statistics are set to
undef. sigma is an estimate of the homoscedastic standard deviation of
the error term, also known as the standard error of the estimate. The
variances use n-1. Durbin-Watson returns undef if the data are
colinear.

`
``
my @predictedYs = $ls->predicted();
my @residuals = $ls->residuals();
`

Call these methods only after you have called *regress()*. Both methods
return data arrays, in the same format you used in *setData()*. If the
data was passed to *setData()* as a reference to an `@xydata` array of the
form (x0, y0, ..., xn, yn), then the results of these methods will be
of this same form, except that the y values will either be the
predicted y based on the coefficient estimates, or the residual error
of that predicted y from the observed value of y.

If the data was passed as references to two arrays, `@xdata` = (x0
... xn) and `@ydata` = (y0 ... yn), then the results of these two
methods will be a single array of y type data, either the predicted y or
residual error. The original x data array will still correspond to
these result arrays.