Instances: 14
Attributes: 5
outlook
temperature
humidity
windy
play
Test mode: split 35% train, remainder test
=== Clustering model (full training set) ===
Number of merges: 1
Number of splits: 0
Number of clusters: 21
node 0 [14]
| node 1 [5]
| | leaf 2 [1]
| node 1 [5]
| | leaf 3 [1]
| node 1 [5]
| | node 4 [2]
| | | leaf 5 [1]
| | node 4 [2]
| | | leaf 6 [1]
| node 1 [5]
| | leaf 7 [1]
node 0 [14]
| node 8 [6]
| | node 9 [2]
| | | leaf 10 [1]
| | node 9 [2]
| | | leaf 11 [1]
| node 8 [6]
| | leaf 12 [1]
| node 8 [6]
| | node 13 [3]
| | | leaf 14 [1]
| | node 13 [3]
| | | leaf 15 [1]
| | node 13 [3]
| | | leaf 16 [1]
node 0 [14]
| node 17 [3]
| | leaf 18 [1]
| node 17 [3]
| | leaf 19 [1]
| node 17 [3]
| | leaf 20 [1]
=== Model and evaluation on test split ===
Number of merges: 0
Number of splits: 0
Number of clusters: 7
node 0 [4]
| node 1 [2]
| | leaf 2 [1]
| node 1 [2]
| | leaf 3 [1]
node 0 [4]
| node 4 [2]
| | leaf 5 [1]
| node 4 [2]
| | leaf 6 [1]
Clustered Instances
0 2 ( 20%)
1 6 ( 60%)
2 1 ( 10%)
4 1 ( 10%)
Таким образом, алгоритм выдал результат 85% «против».
EM
Устанавливаем split 30%.
Scheme: weka.clusterers.EM -I 100 -N -1 -S 100 -M 1.0E-6
Relation: weather
Instances: 14
Attributes: 5
outlook
temperature
humidity
windy
play
Test mode: split 30% train, remainder test
=== Clustering model (full training set) ===
EM
==
Number of clusters selected by cross validation: 1
Cluster: 0 Prior probability: 1
Attribute: outlook
Discrete Estimator. Counts = 6 5 6 (Total = 17)
Attribute: temperature
Normal Distribution. Mean = 73.5714 StdDev = 6.3326
Attribute: humidity
Normal Distribution. Mean = 81.6429 StdDev = 9.9111
Attribute: windy
Discrete Estimator. Counts = 7 9 (Total = 16)
Attribute: play
Discrete Estimator. Counts = 10 6 (Total = 16)
=== Model and evaluation on test split ===
EM
==
Number of clusters selected by cross validation: 1
Cluster: 0 Prior probability: 1
Attribute: outlook
Discrete Estimator. Counts = 1 3 3 (Total = 7)
Attribute: temperature
Normal Distribution. Mean = 70.25 StdDev = 6.7593
Attribute: humidity
Normal Distribution. Mean = 75.25 StdDev = 9.7564
Attribute: windy
Discrete Estimator. Counts = 4 2 (Total = 6)
Attribute: play
Discrete Estimator. Counts = 3 3 (Total = 6)
Clustered Instances
0 10 (100%)
KMEANS:
Устанавливаем split 20%.
Log likelihood: -10.4135
Scheme: weka.clusterers.SimpleKMeans -N 2 -S 10
Relation: weather
Instances: 14
Attributes: 5
outlook
temperature
humidity
windy
play
Test mode: split 20% train, remainder test
=== Clustering model (full training set) ===
kMeans
======
Number of iterations: 3
Within cluster sum of squared errors: 16.23745631138724
Cluster centroids:
Cluster 0
Mean/Mode: sunny 75.8889 84.1111 FALSE yes
Std Devs: N/A 6.4893 8.767 N/A N/A
Cluster 1
Mean/Mode: overcast 69.4 77.2 TRUE yes
Std Devs: N/A 4.7223 12.3167 N/A N/A
=== Model and evaluation on test split ===
kMeans
======
Number of iterations: 2
Within cluster sum of squared errors: 0.0
Cluster centroids:
Cluster 0
Mean/Mode: rainy 71 91 TRUE no
Std Devs: N/A 0 0 N/A N/A
Cluster 1
Mean/Mode: rainy 65 70 TRUE no
Std Devs: N/A 0 0 N/A N/A
Clustered Instances
0 9 ( 75%)
1 3 ( 25%)
Выводы:
Видно, что более точные результаты были получены с помощью алгоритмов ассоциации-NAIVEBAYES и кластеризации-EM. Эти алгоритмы характерны тем, что в первом случае выявляются закономерности между классификаторами. Во втором случае все атрибуты разделяются на таксоны по признакам.