BUSINESS STATISTICS NOTES
B.COM 2ND AND 3RD SEM NEW SYLLABUS (CBCS PATTERN)
Correlation analysis
Correlation Analysis Meaning
Correlation
analysis is simply the degree of the relationship between two or more variables
under consideration. If two or more quantities vary in such a way that
movements in one are accompanied by movement in the other quantity, these
quantities are said to be correlated. For example, there exist some
relationship between prices of the product and quantity demanded, rainfall and
crops etc. Correlation analysis measures the degree of relationship the
variables under consideration.
In
the words of Simpson & Kafka “Correlation analysis deals with the
association between two or more variables.”
Correlation
Coefficients Meaning
Correlation
Coefficient is a numerical measure of degree of relationship between two
variables. Correlation coefficient of two variables ranges from -1 to +1.
Table of
Contents |
1. Meaning of Correlation analysis
and Correlation Coefficients 2. Significance and Limitations of
Correlation Analysis 3. Various Types of Correlation 4. Different Degrees of Correlation 5. Different Methods of Correlation
analysis (Along with merits and demerits) a)
Scatter Diagram Method b)
Graphic Method c) Karl
Pearson’s Coefficient of Correlation d)
Spearmen’s Rank Correlation 6. Correlation and Causation 7. Probable Error COMING SOON ALSO READ: CORRELATION ANALYSIS
COMPLETE FORMULA ALSO READ: CORRELATION AND
REGRESSION ANALYSIS MCQs |
Significance and Limitations of Correlation Analysis
Following
are the main advantages of correlation:
1.
It gives a precise quantitative value indicating the degree of relationship
existing between the two variables.
2.
It measures the direction as well as relationship between the two variables.
3.
Further in regression analysis it is used for estimating the value of dependent
variable from the known value of the independent variable
4.
The effect of correlation is to reduce the range of uncertainty in predictions.
The prediction based on correlation analysis is likely to be more variable and
near to reality.
5.
Correlation analysis also helps in studying the causes of economic disturbance
and suggests measures through which stabilizing forces may become effective.
Following are the main limitations of
correlation:
1.
Extreme items affect the value of the coefficient of correlation.
2.
Its computational method is difficult as compared to other methods.
3.
It assumes the linear relationship between the two variables, whether such
relationship exist or not.
4.
Correlation helps in determining the degree of relationship between two
variables but it does not tell us anything about cause and effect relationship.
Various Types of Correlation
Kinds
of correlation may be studied on the basis of:
A.
On the Basis of change in proportion: There are two important correlations on
the basis of change in proportion. They are:
(a)
Linear correlation: Correlation is said to be linear when one variable move
with the other variable in fixed proportion
(b)
Non-linear correlation: Correlation is said to be non-linear when one variable
move with the other variable in changing proportion.
B.
On the basis of number of variables: On the basis of number of variables,
correlation may be:
(a)
Simple correlation: When only two variables are studied it is a simple
correlation.
(b)
Partial correlation: When more than two variables are studied keeping other
variables constant, it is called partial correlation.
(c)
Multiple correlations: When at least three variables are studied and their
relationships are simultaneously worked out, it is a case of multiple
correlations.
C.
On the basis of Change in direction: On the basis of Chang in direction,
correlation may be
(a)
Positive Correlation: Correlation is said to be positive when two variables
move in same direction.
(b)
Negative Correlation: Correlation is said to be negative when two variables
moves in opposite direction.
Different degrees of Correlation
The different degrees of correlation are:
i)
Perfect Correlation: - It two variables vary
in same proportion, and then the correlation is said to be perfect correlation.
ii)
Positive Correlation: - If increase (or
decrease) in one variable corresponds to an increase (or decrease) in the
other, the correlation is said to be positive correlation.
iii)
Negative Correlation: - If increase (or
decrease) in one variable corresponds to a decrease (or increase) in the other,
the correlation is said to be positive correlation.
iv)
Zero or No Correlation: - If change in one
variable does not other, than there is no or zero correlation.
Different methods of studying correlation
The
different methods of studying relationship between two variables are:
i) Scatter Diagram Method: It is a
graphical representation of finding relationship between two or more variables.
Independent variable are taken on the x-axis and dependent variable on the
y-axis and plot the various values of x and y on the graph. If all values move
upwards then there is positive correlation, if they move downwards then there
is negative correlation.
Merits:
i)
It is easy and simple to use and understand this method.
ii)
Relation between two variables can be studied in a non-mathematical way.
iii)
It is not influenced by the extreme items.
Demerits:
i)
It is non-mathematical method so the results are non-exact and accurate.
ii)
It gives only approximate idea of the relationship.
ii) Graphic Method: This is an extension
of linear graphs. In this case two or more variables are plotted on graph
paper. If the curves move in same direction the correlation is positive and if
moves in opposite direction then correlation is negative. But if there is no
definite direction, there is absence of correlation. Although it is a simple
method, but this shows only rough estimate of nature of relationship.
Merits:
i)
It is easy and simple to use and understand.
ii)
Relation between two variables can be studied in a non-mathematical way.
Demerits:
i)
It is non-mathematical method so the results are non-exact and accurate.
ii)
It gives only approximate idea of the relationship.
iii) Karl
Pearson’s Coefficient of correlation: Correlation coefficient is a
mathematical and most popular method of calculating correlation. Arithmetic
mean and standard deviation are the basis for its calculation. The Correlation
coefficient (r), also called as the linear correlation coefficient measures the
strength and direction of a linear relationship between two variables. The
value of r lies between -1 to +1.
Properties of r:
i)
The coefficient of correlation lies between -1
and +1.
ii)
The co-efficient of correlation is independent
to the unit of measurement of variable.
iii)
The co-efficient of correlation is independent
the change of origin and scale.
iv)
If two variables are independent to each
other, then the value of r is zero.
v)
The coefficient correlation is the geometric
mean of two regression coefficients.
Merits:
i)
The co-efficient of correlation measures the
degree of relationship between two variables.
ii)
It also measures the direction.
iii)
It may be used to determine regression
coefficient provided s.d. of two variables are known.
Demerits:
i)
It assumes always the linear relationship
between the variables even if this assumption is not correct.
ii)
It is affected by extreme values.
iii)
It takes a lot of time to compute.
iv)
Great care must be exercised in interpreting
the value of Karl Pearson’s coefficient of correlation as very often the
coefficient is misinterpreted.
iv)
Spearman’s rank Coefficient of correlation: This is a qualitative method of
measuring correlation co-efficient. Qualities such as beauty, honesty, ability,
etc. cannot be measured in quantitative terms. So, ranks are used to determine
the correlation coefficient.
Features of Spearman’s rank correlation:
i) The sum of the differences of ranks between two variables shall
be zero.
ii) Spearmen’s correlation coefficient is distribution-free or
non-parametric.
Merits:
i)
It is easy and simple to calculate and
understand.
ii)
This method is most suitable if the data are
qualitative.
iii)
This is the only method that can be used where
ranks are given and not the actual data.
iv)
If actual values are given, than rank method
can be applied for ascertaining correlation.
Demerits:
i)
This method cannot be used in case of grouped
frequency distribution.
ii)
Where the number of items exceeds 30 the
calculations become quite tedious and require a lot of time.
Correlation and Causation
Correlation
helps in determining the degree of relationship between two variables but it
does not tell us anything about cause and effect relationship exists between
variables. Correlation shows only the direction in which both variables move
which may be same or opposite or no movement at all. Even if the degree of
correlation between two variables is very high, it does not mean that change in
one variable affects the value of another variable. It is also possible that
correlated variables may be influenced by one or more other variables.
For
example: suppose the correlation of teacher’s salaries and the consumption of
liquor over a period of year comes out to be 0.8, this does not prove that
every teacher drink; nor does it prove that liquor sale increases teacher’s
salaries. Instead, both variables move together because both are influenced by
a third variable – long-rum growth in national income and population. This type
of study between variables that cannot be casually related is called spurious
or nonsense correlation.
Correlation
indicates only the mathematical result but while studying the correlation
between two variables we should reach a conclusion based on logical reasoning
and intelligent investigation of significantly related matters. This can be
done only with the help of causation. Causation indicates that one event is the
result of occurrence of other event i.e., there is a casual relationship
between two variables. Correlation should be used with causation to derive
effective results. In the above mentioned example, we find that there is
correlation between teacher’s salaries and sale of liquor but no causation
exist between the two variables. So, we can say that correlation does not imply
causation. Similarly, causation between two variables does not imply
correlation between variables.
Probable error
Correlation coefficients are calculated from sample data and there are chances of errors. In order to interpret the value of correlation coefficient probable error is used. With the help of probable error it is possible to determine the reliability of the value of the coefficient so far as it depends on the condition of random sampling. The probable error of the coefficient of correlation is obtained as follows:
P.E. (r) = 0.6745(1-r2)/√n
Where r is the coefficient of correlation and n is the number of pairs of observation. If r < P.E. (r) then there is no evidence of correlation. On the other hand if r > 6P.E. (r), then coefficient of correlation is practically certain. By adding and subtracting the value of probable error from the coefficient of correlation, we gent respectively the upper and lower limits within which coefficient of correlation can be expected to lie. Symbolically, rho (⍴) = r+ P.E.
But the measure of probable error can be properly used only when the following three conditions exist:
1. The data must approximate a normal frequency curve.
2. The statistical measure for which the P.E. is computed must have been calculated from a sample.
3. The sample must have been selected in an unbiased manner and the individual items must be independent.