# Introduction to Statisitcs

Statistics - Introduction
The word Statistics seems to have been derived from the Latin word “status” or the Italian word Statista. All word means a political state. In early year “statistics” equipped a collection of facts about the people in the state for administration or political purpose.

Webster defined statistics as “the classified facts representing the conditions of the people in a state, especially those facts which can be stated in numbers or in tables of numbers or in any tabular or classified arrangement.”

A comprehensive definition was given by Prof. Horace Secrist, which is a follows:-
“By Statistics we mean aggregates of facts affected to a marked extent by multiplicity of causes, numerically expressed, enumerated or estimated according to a reasonable standards of accuracy, collected in a systematic manner for a predetermined purpose and placed in relation to each other.”

The above definitions clearly points out certain characteristics which numerical data must possess in order that they may be called statistics. These are as follows:

(i) Statistics are aggregates of facts: Single and isolated figures are not statistics because they cannot be compared and no meaningful conclusion can be drawn from it. It is the only aggregate of facts capable of offering some meaningful conclusion that constitute statistics.
(All statistics are expressed in numbers but all numbers are not statistics)

(ii) Statistics must be numerically expressed: Statistical methods are applicable only to those data which can be numerically expressed. Qualitative expressions like honesty, intelligence, sincere are not statistics unless they can be numerically expressed.

(iii) Statistics should be capable of being related to each other: Statistical data should be capable of comparison and connected to each other. If there is no apparent relationship between the data they cannot be called statistics.

(iv) Statistics should be collected in a systematic manner: For collecting statistical data a suitable plan should be prepared and work should be done accordingly.

(v) Statistics should be collected for a definite purpose: The purpose of collecting data must be decided in advance. The purpose should be specific and well defined.

(vi) Statistics are affected to a marked extent by a large number of causes: Facts and figures are affected to a marked extent by the combined influence of a number of forces.

(vii) Reasonable standard of accuracy should be maintained in collection of statistics: Statistics deals with large number of data. Instead of counting each and every item, Statisticians take a sample and apply the result thus obtained from sample to the whole group. The degree of accuracy of sample largely depends upon the nature and object of the enquiry. If reasonable standard of accuracy is not maintained, numbers may give misleading result.

Various stages in statistical investigation:
There are five stages in a statistical investigation which are given below:
(i) Collection of Data: Utmost care must be exercised in collecting data as they are the foundation of statistical analysis. If the data are faulty, the conclusions drawn can never be reliable.

(ii) Organisation of Data: Data collected from published sources are generally in organised form but data collected from a survey frequently needs organisation. For meaningful analysis, it is necessary to properly organise the collected data. Organising of data involves three steps which are:
(a)    Editing of data
(b)   Classification of data according to some common characteristics
(c)    Tabulation.

(iii) Presentation of Data: Organised data can be further presented in the form of Diagrams and Graphs.

(iv) Analysis: After collection, organisation and presentation, data are analysed by adopting various statistical methods such as measure of central tendency, measure of variation, correlation, regression etc. to dig out information useful for decision-making.

(v) Interpretation: The last stage is interpretation which is a difficult task and requires a high degree of skill, care and experience. If the data have been analysed and not properly interpreted, the whole object of investigation may be defeated and wrong conclusion be drawn.

Functions and Limitations of Statistics:
The functions of statistics are as follows:
(i) It presents fact in a definite form. Numerical expressions are convincing and, therefore, one of the most important functions of statistics is to present statement in a precise and definite form.

(ii) It simplifies mass of figures. The data presented in the form of table, graph or diagram, average or coefficients are simple to understand.

(iii) It facilitates comparison. Once the data are simplified they can be compared with other similar data. Without such comparison the figures would have been useless.

(iv) It helps in prediction. Plans and policies of organisations are invariably formulated in advance at the time of their implementation. knowledge of future trends is very useful in framing suitable policies and plans.

(v) It helps in formulating and testing hypothesis. Statistical methods like z-test, t-test, X2-test are extremely helpful in formulating and testing hypothesis and to develop new theories.

(vi) It helps in the formulation of suitable policies. Statistics provide the basic material for framing suitable policies. It helps in estimating export, import or production programmes in the light of changes that may occur.

(vii) Statistics indicates trend behavior. Statistical techniques such as Correlation, Regression, Time series analysis etc. are useful in forecasting future events.

Limitations of statistics are as follows:
(i) Statistics deals only with quantitative characteristics. Statistics are numerical statements of facts. Data Which cannot be expressed in numbers are incapable of statistical analysis. Qualitative characteristics like honesty, efficiency, intelligence etc. cannot be studied directly.

(ii) Statistics deals with aggregates not with individuals. Since statistics deals with aggregates of facts, the study of individual measurements lies outside the scope of statistics.

(iii) Statistical laws are not perfectly accurate. Statistics deals with such characteristics which are affected by multiplicity of causes and it is not possible to study the effect of these factors. Due to this limitation, the results obtained are not perfectly accurate but only an approximation.

(iv) Statistical results are only an average. Statistical results reveal only the average behavior. The Conclusions obtained statistically are not universally true but they are true only under certain conditions.

(v) Statistics is only one of the methods of studying a problem. Statistical tools do not provide the best solution under all circumstances.

(vi) Statistics can be misused. The greatest limitation of statistics is that they are liable to be misused. The data placed to an inexperienced person may reveal wrong results. Only persons having fundamental knowledge of statistical methods can handle the data properly.

Types of statistical data:
Statistical data are of two types
(a)    Primary data
(b)   Secondary data.

Primary Data: Data which are collected for the first time for a specific purpose are known as Primary data.
For example: Population census, National income collected by government, Textile Bulletin (Monthly), Reserve bank of India Bulletin (Monthly) etc.

Secondary Data: Data which are collected by someone else, used in investigation are knows as Secondary data. Data are primary to the collector, but secondary to the user.

For example: Statistical abstract of the Indian Union, Monthly abstract of statistics, Monthly statistical digest, International Labour Bulletin (Monthly).

Merits and Demerits of Primary Data:
Merits:
(a)    They are reliable and accurate.
(b)   If during collection, the Data are wrong they can be checked again by cross examination.
(c)     It is more suitable if the field of enquiry is small.

Demerits:
(a)    It the field of enquiry is too wide, it is not suitable.
(b)   Collection of primary data is costly and time consuming.
(c)    Personal Bias, prejudice and whims may affect the data.

Merits and Demerits of Secondary Data:
Merits:
(a)    While using secondary data, time and labour are saved.
(b)   It may also be collected from unpublished form.
(c)    If secondary Data are available, they are much quicker to obtain than primary data.

Demerits:
(a)    Degree of accuracy may not be acceptable.
(b)   Secondary Data may or may not fit the need of the project.
(c)    Data may be influenced by personal bias of investigator.

Difference between Primary Data and Secondary Data:
(a)    Primary data are those which are collected for the first time and thus original in character. While Secondary data are those which are already collected by someone else.
(b)   Primary data are in the form of raw-material, whereas Secondary data are in the form of finished products.
(c)    Primary data are collected directly from the people related to enquiry while Secondary data are collected from published materials.
(d)   Data are primary in the hands of institutions collecting it while they are secondary for all others.

Sources of Secondary Data
Sources of Secondary Data:
(a)    Official publication by the central and state governments, district Boards.
(b)   Publication by research institutions, Universities etc.
(c)    Economic Journals.
(d)   Commercial Journals.
(e)   Reports of Commities, commissions.
(f)     Publications of trade associations, Chamber of Commerce etc.

Precautions in the use of Secondary Data: The following aspects should be considered before use of secondary data:
(i) Suitability: The investigator must check before using secondary data that whether they are suitable for the present purpose or not.

(ii) Adequacy: After satisfying about the suitability of data, the investigator has to determine whether they are adequate for the present purpose of investigators.

(iii) Dependability: Dependability of secondary data is determined by the following factors:-
(a)    The authority which collected the data.
(b)   Procedure of Sampling followed.
(c)    Status of Investigator.

(iv) Units in which data are available.

Qualities of Secondary Data:
(a)    Data should be reliable
(b)   Data should be suitable for the purpose of investigator.
(d)   Data should be collected by trained investigator.

Methods of collecting primary Data
(a)  Direct Personal Observation: - Under this method, the investigator collects the data personally from the persons concerned. The information obtained under this method is original in nature. This method is suitable when the field of enquiry is small.

(b) Indirect Oral Investigation: - Under this method, the investigator collects the data from third parties capable of supplying the necessary information. This method is suitable where the information to be obtained is of a complex nature and informants cannot be approached directly.

(c) Schedule and questionnaire: - A list of question regarding the enquiry is prepared and printed. Data are collected in any of the following ways:-
(i) By sending the questionnaire to the persons concerned with a request to answer the question and return the questionnaire.
(ii) By sending the questionnaire through enumerators for helping the informants.

(d) Local reports: - This method gives only approximate results at a low cost.

Questionnaire
A Questionnaire is simply a list of questions in a printed sheet relating to survey which the investigators asks to the informants and the answers of the informants are noted down against the respective questions on the sheet. Choice of questions is a very important parts of the enquiry whatever its nature.

Characteristics of an ideal Questionnaire:
(i)      The Schedule of question must not be lengthy.
(ii)    It should be clear and simple.
(iii)   Questions should be arranged in a logical sequence.
(iv)  Each question should be brief and must aim to some particular information necessary for the    investigation.
(v)    Questions of personal matter like income of property should be avoided.
(vi)  The Units of information should be Cleary shown in the sheet.

Tabulation
Tabulation refers to the systematic arrangement of the information in rows and columns. Rows are the horizontal arrangement. In simple words, tabulation is a layout of figures in rectangular form with appropriate headings to explain different rows and columns. The main purpose of the table is to simplify the presentation and to facilitate comparisons.
According to Neiswanger, "A statistical table is a systematic organisation of data in columns and rows."

The principal objectives of tabulation are stated below:
(i) To make complex data simple: When data are arranged systematically in a table, such data become more meaningful and can be easily understood.

(ii) To facilitate comparison: When different data sets are presented in tables it becomes possible to compare them.

(iii) To economize space: A statistical table furnishes maximum information relating to the study in minimum space.

(iv) To make data fit for analysis and interpretation: Tabulation serves as a link between the collection of data on the one hand and analysis of such data on the other. In other words, after tabulating the data, it becomes possible to find out their averages, dispersion and correlation. Such statistical measures are necessary for their interpretation.

(v) To provide reference: A statistical table can be used as a source of reference for other studies of similar nature.

Importance of Tabulation:
a)       Tabulation makes the data brief. Therefore, it can be easily presented in the form of graphs.
b)       Tabulation presents the numerical figures in an attractive form.
c)       Tabulation makes complex data simple and as a result of this, it becomes easy to understand the data.
d)       This form of the presentation of data is helpful in finding mistakes.
e)       Tabulation is useful in condensing the collected data.
f)        Tabulation makes it easy to analyze the data from tables.
g)       Tabulation is a very cheap mode to present the data. It saves time as well as space.
h)       Tabulation is a device to summaries the large scattered data. So, the maximum information may be collected from these tables.

Limitations of Tabulation
Tabulation suffers from the following limitations:
a)       Tables contain only numerical data. They do not contain details.
b)       qualitative expression is not possible through tables.
c)       Tables can be used by experts only to draw conclusions. Common men do not understand them properly.

Classification of Data
The process of arranging the data in groups or classes according to their common characteristics is technically known as classification. Classification is the grouping of related facts into classes. It is the first step in tabulation.
In the words of Secrist, "Classification is the process of arranging data into sequences and groups according to their common characteristics or separating them into different but related parts."

Essentials of classification
a)       The classification must be exhaustive so that every unit of the distribution may find place in one group or another.
b)       Classification must conform to the objects of investigation.
c)       All the items constituting a group must be homogeneous.
d)       Classification should be elastic so that new facts and figures may easily be adjusted.
e)       Classification should be stable. If it is not so and is changed for every enquiry then the data would not fit for an enquiry.
f)        The data must not overlap. Each item of the data must be found in one class.

Population and Sample
Population: Statistics is taken in relation to a large data. Single and unconnected data is not statistics. In the field of a statistical enquiry there may be persons, items or any other similar units. The aggregate of all such units under consideration is called “Universe or Population”.

Sample: If a part is selected out of the universe then the selected part or portion is known as sample. Sample is only a part of the universe.

Sample survey: It is a survey under which only a part taken out of the universe is investigated. It is not essential to investigate every individual item of the Universe.

Census survey and complete enumeration: Under Census survey detail information regarding every individual person or item of a given universe is collected.

Difference between Census and Sample survey: The following are the differences between Census and Sample method of investigation:
(a) Under Census method, each and every individual item is investigated whereas under sample survey only a part of universe is investigated.

(b) There is no chance of sampling error in census survey whereas sampling error cannot be avoided under sample survey.

(c) Large number of enumerators is required in census whereas less number of enumerators is required in sample survey.

(d) Census survey is more time consuming and costly as compared to sample survey.

(e) Census survey is an old method and it less systematic than the sample survey.

Merits and Demerits of Census:
Merits:
(a)    Since all the individuals of the universe are investigated, highest degree of accuracy is obtained.
(b)   Since there is no possibility of personal bias affecting investigation, this method is free from sampling error.
(c)     It is more suitable if the field of enquiry is small.
(d)   Since all the items of the universe are taken into consideration, all the characteristics of the universe

Demerits:
(a)    It the field of enquiry is too wide, it is not suitable.
(b)   Collection of primary data is costly and time consuming.
(c)    Personal Bias, prejudice and whims may affect the data.

Merits and Demerits of sample survey:
Merits:
(a)    While using secondary data, time and labour are saved.
(b)   It may also be collected from unpublished form.
(c)    If secondary Data are available, they are much quicker to obtain than primary data.

Demerits:
(a)    Degree of accuracy may not be acceptable.
(b)   Secondary Data may or may not fit the need of the project.
(c)    Data may be influenced by personal bias of investigator.