Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Lektsii (1) / Lecture 3

.pdf
Скачиваний:
13
Добавлен:
02.06.2015
Размер:
28.9 Кб
Скачать
= −0.4 , the z-score for Akihara is

ICEF, 2012/2013 STATISTICS 1 year LECTURES

LECTURE 3

18.09.2012

Definition. Let x1, x2 ,..., xn be some distribution with the sample mean x and standard deviation s. The number zi = xi s x is called z-score of the observation xi .

Note that z-score has no dimension. It is a convenient tool for comparing observations from different distributions.

Example. The sample mean and standard deviation of the heights of 25 randomly selected Dutch adult men are 175 cm and 10 cm, respectively while the same statistics of 30 Japanese adult men are 166 cm and 8 cm. Jan (from the first distribution) has the height 171 cm, and Akihara (from the second distribution) has the height 163 cm. Who is (relatively) higher? The z-score for Jan is

171175

10

Akihara is relatively higher than Jan.

163 166

= −0.375 > −0.4 . So we may conclude that

8

 

BALANCED FEATURE OF THE SAMPLE MEAN. Assume that the histogram of a distribution is made of uniform metal sheet. Then it can be easily proved that the sample mean of a distribution is a balanced point of the sheet. It means that if we put the histogram sheet on a solid stick at the point of sample mean then the sheet would be in equilibrium.

SHAPE OF A DISTRIBUTION

The observations that are close to the Min or to the Max are called left or right tails.

Definition. A distribution is called skewed to the right if the right tail is far from the central part of a distribution while the left tail is not.

Similarly a distribution skewed to the left is defined.

Skewed to the right

Skewed to the left

Exercise. Show that if a distribution is skewed to the right then sample mean > median.

Distribution that is not skewed is called symmetric. For symmetric distribution the sample mean is equal to the median and the histogram is (approximately) symmetric with respect to the vertical line drawing through the mean.

OUTLIERS. A distribution may contain outliers. In order to find them you should calculate so called low and upper inner fences:

IFl = LQ 1.5 IQR, IFu =UQ +1.5 IQR .

Definition. An observation xi is called outlier if it is out of the inner fences, i.e. xi [IFl , IFu ]. An outlier xi is called low (upper) outlier if xi < IFl ( xi > IFu ).

Example 3. The table below contains the incomes of 40 randomly selected people (thousands $, ascending order)

2.004

4.926

5.96

6.83

3.454

5.059

6.044

7.009

3.571

5.419

6.132

7.445

3.794

5.441

6.207

7.546

3.973

5.488

6.404

7.727

4.057

5.508

6.457

7.764

4.346

5.564

6.566

7.945

4.486

5.728

6.622

8.373

4.68

5.795

6.729

8.825

4.741

5.819

6.782

9.061

Using Excel we get:

LQ = 4.88, UQ =6.80, IQR =1.92, IFl = 2.00, IFu =9.68 .

Since Min = 2.004 > IFl , Max =9.061 < IFu , there are no outliers in this distribution.

GRAPHIC REPRESENTATION AND DESCRIPTIVE STATISTICS

(continued) BOX-PLOTS.

This is box-plot for the distribution in Example 3, vertical orientation

10

9

8

7

6

5

4

3

2

1

INCOME

The low point of the whiskers is the Min, the upper point of the whiskers is the Max, the low horizontal side is the LQ, the upper horizontal side is the UQ, and the horizontal line in the box is the med.

Соседние файлы в папке Lektsii (1)