Lektsii (1) / Lecture 1
.pdfICEF, 2012/2013 STATISTICS 1 year LECTURES
LECTURE 1
September, 06, 2012
INTRODUCTION
Some particular questions that we will answer:
1)The poll of 1500 people before the presidential election gives the following result: 795 for democratic candidate and 765 for republican candidate. What (if any) conclusions could be made?
2)The agency published the results of a poll: 56% for the democratic candidate with the margin of error ±3% . What does it mean?
3)The governor of a region claims that the average salary in the region is at least 8000 Rub per month. The mean salary of 100 randomly selected adult inhabitants is 7560 Rub. Is this the substantial reason not to believe the governor’s statement?
4)You should compare the effectiveness of two drugs, old and new. How to design an appropriate research?
5)You want to know the opinion of the inhabitants of some district on the construction of new kindergarten in this district. In order to do this the phone calls were made from 10 to 17 o’clock. What should you expect?
6)You investigate the relationship between income and food expenses. Namely, what is the mean change of food expenses if the income increases by 1000 Rub?
The main mathematical tool (besides calculus) is the probability theory.
1)What is more likely, to get exactly 500 tails flipping a fair coin 1000 times or to get exactly 1000 tails flipping a fair coin 2000 times?
2)Russian kid plays ten cards with the Russian letters М, М, Т, Т, А, А, А, К, И, Е. He get the word «МАТЕМАТИКА». Is thisграмотный)?kid literate (
3)The unemployment rate in a city is 12%. What is the chance to get more than 15 unemployment people among 100 randomly selected adult people in this city?
CHAPTER 1
GRAPHIC REPRESENTATION OF INFORMATION AND DESCRIPTIVE
STATISTICS
GRAPHIC REPRESENTATION OF INFORMATION
Definition. The set (bundle, collection) of data x1, x2 ,..., xn is called distribution or observations
or sample.
The number n is called the number of observations or the size of a distribution or the size of a sample.
THE TYPES OF DATA
Numerical |
|
Non numerical |
(quantitative) |
|
(qualitative) |
|
|
|
|
Discrete |
|
|
Continuous |
|
|
Ordered |
|
Non ordered |
|
1) |
number of points |
|
1) |
man’s weight; |
|
1) |
level of |
|
(categorical) |
|
|
on the face of a die; |
|
2) |
price of a flat; |
|
|
income |
|
1) |
brand of a |
2) |
number of car |
|
3) |
monthly |
|
|
(low, |
|
|
bought car; |
|
accidents per day; |
|
|
salary; |
|
|
middle…) |
|
2) |
preferred sport |
3) |
number of voting |
|
4) |
family |
|
2) |
rating of a |
|
|
(football, |
|
for democrats in a |
|
|
monthly food |
|
|
bank… |
|
|
tennis, |
|
sample of 1000 |
|
|
expenses … |
|
|
|
|
|
basketball)… |
|
people … |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1. Dot plots (may be used for all types of data)
Example 1 (discrete data). The die were tossed 20 times and the results are as follows
1 |
2 |
3 |
4 |
5 |
6 |
4 times |
6 times |
2 times |
3 times |
2 times |
3 times |
Important. You should understand that this table corresponds, for example, to the distribution
x1 = 1, x2 = 1, |
x3 = 1, x4 = 1, x5 = 2, x6 = 2, x7 = 2, x8 = 2, x9 = 3,..., x20 = 6. |
||||||||
Dot plot: |
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
* |
|
|
|
|
|
|
|
|
|
* |
|
|
|
|
|
|
|
* |
|
* |
|
|
|
|
|
|
|
* |
|
* |
|
* |
|
|
* |
|
|
* |
|
* |
* |
* |
* |
|
* |
|
|
* |
|
* |
* |
* |
* |
|
* |
|
|
1 |
|
2 |
3 |
4 |
5 |
|
6 |
|
|
|
|
|
|
|
Fig. |
1 |
|
|
Similarly for non-numerical data.
Example 2 (continuous data). The monthly salaries (in thousands Rub) of 50 people are given in the table below.
17.17 |
22.13 |
24.68 |
27.56 |
29.43 |
|
|
|
||
18.14 |
22.26 |
25.16 |
27.87 |
29.64 |
|
|
|
||
19.44 |
22.34 |
25.47 |
27.99 |
29.64 |
|
|
|
||
19.50 |
22.61 |
25.68 |
28.41 |
29.66 |
|
|
|
||
20.20 |
23.11 |
26.12 |
28.51 |
29.78 |
|
|
|
||
20.29 |
23.23 |
26.33 |
28.72 |
30.09 |
|
|
|
||
20.86 |
23.43 |
26.45 |
28.75 |
30.18 |
|
|
|
||
21.24 |
23.44 |
26.60 |
28.99 |
30.61 |
|
|
|
||
21.44 |
23.85 |
27.17 |
29.08 |
31.48 |
|
|
|
||
21.56 |
23.86 |
27.36 |
29.17 |
32.26 |
|
|
|
||
Let’s divide the range of salaries on the segments |
|
|
|||||||
|
[15, 20),[20, 25),[25,30),[30,35) |
|
|
|
|||||
and count the number of observations in each segment. Then |
|
|
|||||||
[15, 20) 4, [20, 25) 17, [25,30) 24, |
[30,35) 5. Finally design dot-plot: |
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
*** |
|
|
|
|
|
|
|
|
|
*** |
|
|
|
|
|
|
** |
|
|
*** |
|
|
|
|
|
|
*** |
|
*** |
|
|
|
|
|
|
|
*** |
|
*** |
|
|
|
|
|
|
|
*** |
|
*** |
|
|
|
|
|
* |
|
*** |
|
*** |
** |
|
|
|
|
*** |
|
*** |
|
*** |
*** |
|
|
|
|
[15, 20) |
|
[20, 25) |
|
[25, 30) |
[30, 35) |
|
|
|
|
|
|
|
|
Fig. |
2 |
|
|
2. Stem and leaf plots (may be used for numerical data). Example 3.
Income of 40 randomly selected people (rub) (sorting in ascending order for convenience)
10600 |
13700 |
14500 |
16800 |
10800 |
13900 |
14600 |
17200 |
11300 |
14000 |
14800 |
17400 |
11600 |
14000 |
14800 |
17600 |
11800 |
14200 |
15300 |
17700 |
12400 |
14200 |
15500 |
18300 |
12800 |
14300 |
16100 |
18500 |
13000 |
14300 |
16400 |
18800 |
13500 |
14300 |
16500 |
18900 |
13600 |
14400 |
16700 |
19800 |
Stem and leaf plot
(stem − thousands, leaf –hundred)
10 |
6 8 |
11 |
3 6 8 |
12 |
4 8 |
13 |
0 5 6 7 9 |
14 |
0 0 2 2 3 3 3 4 5 6 8 8 |
15 |
3 5 |
16 |
1 4 5 7 8 |
17 |
2 4 6 7 |
18 |
3 5 8 9 |
19 |
8 |
Fig. 3