## ECO-07: ELEMENTS OF STATISTICS

Indira Gandhi National Open University (IGNOU)

Maidan Garhi, New Delhi -110068

Elective Course in Commerce

ECO – 07: ELEMENTS OF STATISTICS

ASSIGNMENT- 2019-20

Dear Students,

As explained in the Programme Guide, you have to do one Tutor Marked Assignment in this Course.

Assignment is given 30% weightage in the final assessment. To be eligible to appear in the Term-end examination, it is compulsory for you to submit the assignment as per the schedule. Before attempting the assignments, you should carefully read the instructions given in the Programme Guide.

This assignment is valid for two admission cycles (July 2019 and January 2020). The validity is given below:

Those who are enrolled in July 2019, it is valid up to June 2020.

Those who are enrolled in January 2020, it is valid up to December 2020.

You have to submit the assignment of all the courses to The Coordinator of your Study Centre. For appearing in June Term-End Examination, you must submit assignment to the Coordinator of your study centre latest by 15th March. Similarly for appearing in December Term-End Examination, you must submit assignments to the Coordinator of your study centre latest by 15th September.

## TUTOR MARKED ASSIGNMENT

Course Code: ECO - 07

Course Title: ELEMENTS OF STATISTICS

Assignment Code: ECO - 07 - 11/TMA/2019-20

Coverage: All Blocks

Maximum Marks: 100

Attempt all the questions

1. What is statistical survey? Discuss the steps to be followed while conducting statistical survey. (20)

Ans: Statistical Survey/Investigation: Statistical investigation is the search of information with the help of statistical devices. The general procedure in a statistical investigation is collection of data, classification and tabulation of data, presentation and comparison of data and interpretation of results. A statistical investigation or survey passes through several stages. These stages can be summarised under two broad heads:

Planning the investigation.

Executing the investigation.

(A) Planning The Investigation: Proper planning of a survey is of great importance because the quality of a survey depends mostly on planning. The matters which require careful consideration at the planning stage are:

1. Purpose of the survey: The purpose or objective of the survey must be very specific. We must be very clear about the type of information needed and the use to which the information obtained will be put.

2. Scope of the survey: The scope of the survey relates to the coverage with respect to the type of information, subject matter and geographical area. Three factors that exert great influence on scope are – the object of enquiry, availability of time and availability of resources. The investigation should be carried out within a reasonable period of time, otherwise the information collected may become out-of-data and hence of little use.

3. Unit of data collection: To carry out a statistical investigation the statistical unit or units must be clearly defined. The unit in terms of which the investigation counts or measures the variable or attributes selected for enumeration, analysis and interpretation is known as statistical unit. For example, in a population census the statistical unit is a person. To be an ideal statistical unit it must process the following qualities:

It should be clear and unambiguous.

It should be specific.

It should be stable.

It should be uniform.

It should be appropriate to the enquiry.

Statistical units can be broadly classified under two heads:

Units of collection and

Units of analysis and interpretation.

(a) Units of collection: Data for the enquiry are collected in terms of units of collection. The units of collection may be simple, compound or composite. A simple unit represents a single condition without qualifications. Example of such units are – house, worker, hours etc. A simple unit with some qualifying words is called a compound unit. Example of such units are – skilled worker, man-hours etc.

(b) Units of analysis and interpretation: Statistical data are ultimately analyzed and interpreted with the units of analysis and interpretation. These units facilitate comparisons between different sets of data with respect to time and place. Generally the units of analysis are – rates, ratios, percentage and coefficients.

4. Source of data: For any statistical investigation the source of data may be either primary or secondary. Data is termed primary when the reference is to data collected for the first time by the investigator and termed secondary when the information are taken from available records on published material or data already available. The census department at regular intervals collects data on population of the country. These constitute primary data to the census department. But these data will be secondary to someone who is using them for some other investigation. The choice of the source of data depends largely on purpose and scope of the survey.

5. Technique of data collection: Mainly there are two techniques of data collection, namely (1) Census technique and (2) Sample technique. In census technique, information is obtained from each and every unit of the population which forms the subject matter of the study. Whereas, in the sample technique, information is obtained only from a representative part of the population and based on that, inference is drawn for the entire population. The census method is costlier and more time consuming. The choice of technique of data collection depends upon factors like – (a) the availability of resources, (b) the time factor, (c) degree of accuracy desired and (d) nature and scope of investigation.

6. Choice of frame: The term frame or population frame refers to all the units in the population under study. The whole structure of enquiry is to a considerable extent determined by the frame. Detailed planning of a survey cannot be undertaken unless we know the nature and accuracy of the available frame. If there does not exist any frame, then the construction of a frame suitable for the purpose of the enquiry will constitute a major part of planning.

7. Degree of accuracy desired: The investigator must determine the degree of accuracy he wants to attain. In any statistical work, it is very difficult to attain cent percent or 100% accuracy. The object of enquiry primarily determines the degree of accuracy. A very high degree of accuracy may mean a lot of time and cost. The necessary degree of accuracy in counting or measuring depends upon practical value of accuracy in relation to its cost.

8. Miscellaneous considerations: A decision has to be taken as to whether the enquiry has to be (1) regular or adhoc, (2) direct or indirect, (3) official, semi-official or non-official and (4) confidential or non-confidential.

(B) Executing The Investigation: After planning the survey the next step is to execute the plan. The various phases of work at execution stage are as follows:

Setting up an administrative organization.

Designing of forms.

Selection, training and supervision of field investigators.

Control on the quality of field work.

Follow up of non-response.

Processing of data.

Preparation of report.

1. Selling up an administrative organization: An administrative organization is needed for an investigation. The size of the organization depends upon the nature and scope of the enquiry. If the scope is wide and covers a large geographical area, regional offices may be set up, apart from a central office. The complete control and administration of the enquiry fall on the administrative team.

2. Designing of forms: The various forms specially the forms of questionnaires that are used in the course of enquiry should be designed with utmost care and caution. The questionnaire is the medium of communication between the investigator and the respondents and hence must be designed and drafted by skilled and experienced persons.

3. Selection, training and supervision of field investigators: The success of a survey depends upon the work of the enumerators. Therefore, these enumerators should be properly selected and thoroughly trained for the field work. Constant supervision is essential to achieve high quality work.

4. Control on the quality of field work: Steps must be taken to ensure that the survey is under statistical control, i.e., the errors if any, in the survey are due to the presence of random variation and no assignable causes of variation are present. A system of field checks by the supervisors should also be introduced to maintain the standard. The field checks should preferably be carried out on a random sub-sample of units, and should be conducted without any prior notice.

5. Follow up of non-response: In spite of best efforts in the data collection, there may be respondents, who do not supply the desired information. To deal with the problem of non-response a list of non-respondents is made and a sub-sample of them is taken. Then with the help of supervisory staff efforts can be made for securing response from the members of the sub-sample.

6. Processing of data: After collection of data, the processing of data takes place. The data are generally coded, transferred to punched cards or tapes with the help of punching machines or computers. Classification and tabulation are important processes in any statistical investigation. Through these processes collected data are summarised and arranged in a systematic order.

7. Preparation of report: After the data have been collected and analyzed, the results of the survey is drafted in the form of a report. The preparation of report is the final step in execution of a survey. Two kinds of reports may be presented – a general report giving a description of the survey or a technical report giving detailed of the sample design, computational procedures, accuracy and allied aspects.

2. Central tendency, dispersion and skewness are three different measures to analyse numerical data. Comment. (20)

Ans: Central Tendency: One of the most important objectives of statistical analysis is to get one single value that describes the characteristic of the entire mass of unwieldy data. Such a value is called the central value or an ‘average’ or the expected value of the variable. The word average is very commonly used in day-to-day conversation. For example, we often talk of average boy in a class, average height or life of an Indian, average income, etc. When we say ‘he is an average student’ what it means is that he is neither vary good nor very bad, just a mediocre type of student. However, in statistics the term average has a different meaning. The word ‘average’ has been defined differently by various authors. Some important definitions are given below:

“Average is an attempt to find one single figure to describe whole of figures.” – Clark.

“An average is a single value selected from a group of values to represent them in some way – a value which is supposed to stand for whole group, of which it is a part, as typical of all the values in the groups.” – A. E. Waugh.

It is clear from the above definitions that an average is a single value that represents a group of values. Such a value is of great significance because it depicts the characteristic of the whole group. Since an average represents the entire data, its value lies somewhere in between the two extremes, i.e., the largest and the smallest items. For this reason an average is frequently referred to as a measure of central tendency.

There are two main objectives of the study of averages:

To get single value that describes the characteristic of the entire group. Measures of central value, by condensing the mass of data in one single value, enable us to get a bird’s-eye view of the entire data. Thus one value can represent thousands, lakhs and even millions of values.

To facilitate comparison. Measures of central value, by reducing the mass of data to one single figure, enable comparison to be made. Comparison can be made either at a point of time or over a period of time.

Measures of Dispersion: The various measures of central value discussed in the previous chapter give us one single figure that represents the entire data. But the averages alone cannot adequately describe a set of observations, unless all the observations are the same. It is necessary to describe the variability or dispersion of the observations. In two or more distributions the central value may be the same but still there can be wide disparities in formation of the distribution. Measures of dispersion help us in studying this important characteristic of a distribution. Some important definitions of dispersion are given below:

“Dispersion is the measure of the variation of the items.” – A. L. Bowley.

“The degree to which numerical data tend to spread about an average value is called the variation of dispersion of the data.” – Spiegel.

It is clear from above that dispersion (also known as scatter, spread or variation) measures the extent to which the items vary from some central value. Since measures of dispersion give an average of the differences of various items from an average, they are also called average of the second order.

An average is more meaningful when it is examined in the light of dispersion. For example, if the average wage of the workers of factory A is Rs. 1885 and that of factory B Rs. 1900, we cannot necessarily conclude that the workers of factory B are better off because in factory B there may be much greater dispersion in the distribution of wages.

The study of dispersion is of great significance in practices as could be will appreciated from the following example:

Since arithmetic mean is the same in all three series, one is likely to conclude that these series are alike in nature. But a close examination shall reveal that distributions differ widely from one another. In series A, each and every item is perfectly represented by the arithmetic mean or, in other words, none of the items of series A deviates from the arithmetic mean and hence there is no dispersion. In series B, only one item is perfectly represented by the arithmetic mean, and other items vary but the variation is very small as compared to series C. In series C, not a single item is represented by the arithmetic mean and the items vary widely from one another. In series C dispersion is much greater compared to series B. Similarly, we may have two groups of labourer with the same mean salary and yet their distributions may differ widely. The mean salary may not be so important a characteristic as the variation of the items from the mean. To the student of social affairs, the mean income is not so vitally important as to know how this income is distributed. Are a large number receiving the mean income or are there a few with enormous incomes and millions with incomes far below the mean? The three figures below represent frequency distributions with some of the characteristics we wish to emphasize here.

The two curves in diagram (a) represent two distributions with the same mean , but with different dispersions. The two curves in (b) represent two distributions with the same dispersion but with unequal means and . Finally (c) represents two distributions with unequal dispersion. The measures of central tendency are, therefore, insufficient. They must be supported and supplemented with other measures. In this chapter, we shall be especially concerned with the measures of variability, or spread or dispersion. A measure of variation or dispersion is one that measures the extent to which there are differences between individual observations and some central or average value. In measuring variation we shall be interested in the amount of the variation or its degree but not in the direction. For example, a measure of 6 inches below the mean has just as much dispersion as a measure of 6 inches above the mean. Measures of variation are needed for four basic purposes:

To determine the reliability of an average.

To serve as a basis for the control of the variability.

To compare two or more series with regard to their variability.

To facilitate the use of other statistical measures.

SKEWNESS: There are two other comparable characteristics called skewness and kurtosis that help us to understand a distribution. Two distributions may have the same mean and standard deviation but may differ widely in their overall appearance as can be seen from the following:

In both these distributions the value of mean and standard deviation is the same X=Ïƒ=5. But it does not imply that the distributions are alike in nature. The distribution on the left-hand side is a symmetrical one whereas the distribution on the right-hand side is asymmetrical or skewed. Measures of skewness help us to distinguish between different types of distributions. Some definitions of skewness are as follows:

“When a series is not symmetrical it is said to be asymmetrical or skewed.” – Croxton & Cowden.

“Skewness refers to the asymmetry or lack of symmetry in the shape of a frequency distribution.” – Morris Hamburg.

The analysis of above definitions shows that the term ‘SKEWNESS’ refers to lack of symmetry, i.e., when a distribution is not symmetrical (or is asymmetrical) it is called a skewed distribution. Any measure of skewness indicates the difference between the manner in which items are distributed in a particular distribution compared with a symmetrical (or normal) distribution. If, for example, skewness is positive, the frequencies in the distribution are spread out over a greater range of values on the high-value end of the curve (the right-hand side) than they are on the low value end. If the curve is normal spread will be the same on both sides of the centre point and the mean, median and mode will all have the same value. The concept of skewness gains importance from the fact that statistical theory is often based upon the assumption of the normal distribution. A measure of skewness is, therefore, necessary in order to guard against the consequences of this assumption.

3. Calculate a) Mean, b) Variance and c) Standard deviation from the following frequency distribution: (20)

Ans: Calculation of Mean Variance & S.D.:

4. (a) What is the relationship between Mean. Median and Mode:

Symmetrical curve

A negatively slewed curve

A positively skewed curve

Ans: Relationship among Mean, Median and Mode: A distribution in which the values of mean, median and mode coincide, (i.e., mean = median = mode) is known as asymmetrical distribution. Conversely stated, when the values of mean, median and mode are not equal the distribution is known as asymmetrical or skewed. In moderately skewed or asymmetrical distribution a very important relationship exists among mean, median and mode. In such distributions the distance between the mean and the median is about one-third the distance between the mean and the mode as will be clear from the diagram given:

Karl Pearson has expressed this relationship as follows:

Mode = Mean – 3 [Mean – Median]

Mode = 3 Median – 2 Mean

and Median = Mode + 2/3 [Mean – Mode]

If we know any of the two values out of the three, we can compute the third from these relationships. The following example will illustrate this point.

Example 1: In a moderately asymmetrical distribution, the mode and mean are 32.1 and 35.4 respectively. Find out the value of Median.

Example 2: Given median = 20.6, mode = 26, find mean

(b) For a frequency distribution: Q3-Q2 = 40 & Q2-Q1=60 Find coefficient of skewness. (10+10)

Ans: Coefficient of Skewness

5. Write short notes on the following - (5×4)

a) Principle of preparing graphs.

Ans: A large variety of graphs are used in practice. However, here we shall discuss only some important types of graphs which are more popular. Broadly, the various graphs can be divided under the following two heads:

Graphs of time series.

Graphs of frequency distributions.

Constructing charts and graphs is an art which can be acquired through practice. There are a number of simple rules, adoption of which leads to the effectiveness of the graphs. However, before discussing these rules the elementary procedure of constructing a graph is considered.

Principles of Constructing graph

For constructing graphs, we make use of graph paper. Two simple lines are first drawn which interest each other at right angles. The lines are known as coordinate axes. The point of intersection is known as the point of origin or the ‘zero’ point. The horizontal line is called the axis of X or ‘abscissa’ and the vertical line the axis of Y or ‘ordinate’. The alternative appellations are X-axis and Y-axis respectively. The following are the two lines:

In the above figure, O is the point of origin, XOX’ is the axis of X or the ‘abscissa’ and YOY’ the axis of Y or the ‘ordinate’. Both positive as well as negative values can be shown on the graph. Distances measures towards the right or upward from the origin are positive and those measured towards the left or downwards are negative.

The whole plotting area is divided into four quadrants are shown above. In quadrant I, both the values of X and Y are positive. In quadrant II, Y is positive, X is negative; the quadrants III both X as well as Y are negative and in quadrant IV, X is positive whereas Y is negative. Since most business data are positive quadrant I is most frequently used.

It is conventional to take the independent variable on the horizontal scale and the dependent on the vertical scale. In case of time series, time is represented on the horizontal sale and the variable on the vertical scale. For each axis a convenient scale is chosen which represents the unit of a variable. The choice is made in such a manner that the entire data are accommodated in the space available. The scale on X-axis and Y-axis need not be identical.

b) Statistical Table

Ans: One of the simplest and most revealing devices for summarizing data and presenting them in a meaningful fashion is the statistical table. A table is a systematic arrangement of statistical data in columns and rows. Rows are horizontal arrangements whereas columns are vertical ones. The purpose of a table is to simplify the presentation and to facilitate comparisons. The simplification results from the clear-cut and systematic arrangement, which enables the reader to quickly locate desired information. Comparison is facilitated bringing related items of information close together.

Role of statistical table

Tables make it possible for the analyst to present a huge mass of data in a detailed orderly manner within a minimum of space. Because of this tabular presentation is the cornerstone of statistical reporting. The significance of tabulation will be clear from the following points.

It simplifies complex data. When data are tabulated all unnecessary details and repetitions are avoided. Data are presented systematically in columns and rows. Hence, the reader gets a very clear idea of what the table presents. There is thus a considerable saving in time taken in understanding what is represented by the data and all confusion is avoided. Also a large amount of space is saved because of non-duplicating of headings and designations; the description at the top of a column serves for all the terms beneath it.

It facilitates comparison. Tabulation facilitates comparison. Since a table is divided into various parts and for each part there are totals and sub-totals, the relationship between different parts of data can be studies much more easily with the help of a table than without it.

It gives identity to the data. When the data are arranged in a table with a title and number they can be distinctly identifies and can be used as a source reference in the interpretation of a problem.

It reveals patterns. Tabulation reveals patterns within the figures which cannot be seen in the narrative form. It also facilitates the summation of the figures if the reader desires to check the totals.

c) Data Array

Ans: Data are numbers and measurements collected from observations and an array is a systematic arrangement of objects, usually in rows and columns. So, Data array can be defined as systematic arrangement of data in the form of rows and columns. Data may be of four types: Primary Data, Secondary Data, Qualitative Data and Quantitative Data. Data stored in arrays are of similar size. Arrays are mainly used in statistics to organise data so that a related set of values cab be easily sorted and searched.

d) Properties of Normal Curve.

Ans: Properties of the Normal Distribution/Curve: The following are the important properties of the normal curve and the normal distribution:

- The normal curve is “bell-shaped” and symmetrical in its appearance. If the curves were folded along its vertical axis, the two halves would coincide. The number of cases below the mean in a normal distribution is equal to the number of cases above the mean, which makes the mean and median coincide. The height of the curve for a positive deviation of 3 units is the same as the height of the curve for negative deviation of 3 units.
- The height of the normal curve is at its maximum at the mean. Hence the mean and mode of the normal distribution coincide. Thus for a normal distribution mean, median and mode are all equal.
- There is one maximum point of the normal curve which occurs at the mean. The height of the curve declines as we go in either direction from the mean. The curve approaches nearer and nearer to the base but it never touches it i.e., the curve is asymptotic to the base on either side. Hence its range is unlimited or infinite in both directions.
- Since there is only one maximum point, the normal curve is un-modal, i.e., it has only one mode.
- The points of inflexion, i.e., the points where the change in curvature occurs areX±Ïƒ.
- As Distinguished from Binomial and Poisson distributions when the variable is discrete, the variable distributed according to the normal curve is a continuous one.
- The first and third quartiles are equidistant from the median.
- The mean deviation is 4th or more precisely 0.7979 of the standard deviation.
- The area under the normal curve distributed as follows:

- Mean 1 Ïƒ covers 68.27% area; 34.135% area will lie on either side of the mean.
- Mean ± 2Ïƒ covers 95.45% area.
- Mean 3Ïƒ covers 99.73% area.