Project 2
Regression Analysis
P 11_17
DataPro is a firm that provides data-processing services to a variety of industries. For the past year, they have collected information on the number of contracts sold, average contract price, advertising expenditures and personal selling expenditures. By using scatter plots, correlation analysis, and regression analysis we can determine the relationships between the dependent variable and the independent variables.
The table below contains the data that will be used for the remainder of the project
Month |
Contracts Sold |
Average Contract Price |
Advertising Expenditures |
Personal Selling Expenditures |
1 |
2,000 |
$2,600 |
$19,800 |
$34,000 |
2 |
2,000 |
$2,700 |
$20,300 |
$32,000 |
3 |
3,000 |
$2,700 |
$29,800 |
$45,000 |
4 |
1,250 |
$2,900 |
$12,300 |
$26,000 |
5 |
1,750 |
$3,000 |
$17,400 |
$33,000 |
6 |
2,750 |
$3,200 |
$28,200 |
$40,000 |
7 |
2,500 |
$3,200 |
$18,400 |
$41,000 |
8 |
1,000 |
$3,200 |
$10,400 |
$26,000 |
9 |
1,500 |
$3,500 |
$15,300 |
$34,000 |
10 |
1,750 |
$3,600 |
$17,400 |
$35,000 |
11 |
2,250 |
$3,700 |
$23,500 |
$39,000 |
12 |
2,500 |
$3,800 |
$26,800 |
$43,000 |
Now that we have all of the data, we can make a scatterplot to show the relationship between the dependent variable and the dependent variables.
Both of the scatterplots above show a strong relationship between the dependent variable and the causal (independent) variable. In most cases, as the causal variable increases so does the dependent variable. When there is a strong relationship between variables, the data points form a line (upward sloping in this case because they are positively related).
In the scatterplot below the points don’t form a line like the scatterplots above. They don’t seem to show any sort of relationship between the variables.
Therefore, just by looking at the scatterplots, we can see which causal variables are more likely to have an effect on the dependent variable.
The scatterplots provide a great visual tool for seeing the relationships among variables, but they don’t give an exact measure of the relationship. Using the correlation we can get an exact measure of the relationship between the variables.
Using the data analysis feature of Excel can make the correlation table below.
|
Avg.Contract
Price |
Adv
Expenditures |
Pers.
Selling Expenditures |
Contracts
Sold |
Avg.Contract
Price |
1 |
|
|
|
Adv
Expenditures |
0.046122361 |
1 |
|
|
Pers.
Selling Expenditures |
0.250233614 |
0.880802646 |
1 |
|
Contracts
Sold |
-0.016001334 |
0.94120431 |
0.936825657 |
1 |
· The dependent variable (Contracts Sold) has a low correlation (-.016) with Average Contract Price. Therefore this independent variable has no effect on the number of contracts sold. In most cases this would not be included in the regression analysis model, but I will still use in my example.
The other two independent variables (Advertising Expenditures and Personal Selling Expenditures) have high correlations (.941 and .937 respectively) with the dependent variable.
· Next the correlation between independent variables needs to be examined. Unlike the relationship between dependent and independent variables, the relationship between independent variables should be low.
· Average Contract Price has a low correlation with the other independent variables.
· The correlation between Advertising Expenditures and Personal Selling Expenditures is high (.88). Usually only one of these variables would be used in the model, since they have a high correlation. Independents with high correlation usually are not used because of their redundancy, but I am going to use all of the independents because I would only be left with one variable, and would not be able to calculate multiple regression analysis.
Using Excel’s data analysis tool, regression analysis can be calculated
SUMMARY
OUTPUT |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Regression
Statistics |
|
|
|
|
|
|
|
|
Multiple
R |
0.9847 |
|
|
|
|
|
|
|
R Square |
0.9697 |
|
|
|
|
|
|
|
Adjusted
R Square |
0.9584 |
|
|
|
|
|
|
|
Standard
Error |
123.9224 |
|
|
|
|
|
|
|
Observations |
12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ANOVA |
|
|
|
|
|
|
|
|
|
df |
SS |
MS |
F |
Significance
F |
|
|
|
Regression |
3 |
3934437.548 |
1311479.183 |
85.40 |
2.04E-06 |
|
|
|
Residual |
8 |
122854.1191 |
15356.76489 |
|
|
|
|
|
Total |
11 |
4057291.667 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Coefficients |
Standard
Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower
95.0% |
Upper
95.0% |
Intercept |
-117.5135 |
333.5259 |
-0.3523 |
0.7337 |
-886.6260 |
651.5991 |
-886.6260 |
651.5991 |
Average_ContractPrice |
-0.2965 |
0.1020 |
-2.9081 |
0.0196 |
-0.5316 |
-0.0614 |
-0.5316 |
-0.0614 |
Advertising
Expenditures |
0.0360 |
0.0139 |
2.5790 |
0.0327 |
0.0038 |
0.0681 |
0.0038 |
0.0681 |
Pers. Selling
Expenditures |
0.0662 |
0.0144 |
4.6065 |
0.0017 |
0.0331 |
0.0993 |
0.0331 |
0.0993 |
· R Squared shows that almost 97% of the variance can be explained by the variables.
· The equation is Y=-117.51 -.2965Average Contract Price+.036Advertising Expenditures+.0662Personal Selling Expenditures