Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Gosy.doc
Скачиваний:
34
Добавлен:
02.05.2019
Размер:
367.1 Кб
Скачать

13.Classification of tests.

As to the purpose, tests fall into 4 main categories: proficiency, achievement, progress and diagnostic. Proficiency tests (to measure people’s ability in a language regardless of any training they may have had in this language). Proficiency tests should have detailed specifications – what successful candidates will have demonstrated. Achievement tests might be further subdivided into 2 classes: final and progress achievement tests. They may be administered by ministeries of education, official examining boards or members of teaching institutions. To design them one might take 2 pathways. The first is syllabus directed, which means that a final achievement test is to be based directly on a detailed course syllabus or on the coursebooks. The alternative approach is to base the test content directly on the objectives of the course – which works against perpetuation of poor teaching practice.

As to the progress tests they are a reliable tool if one establishes a series of well-defined short-term objectives first. If the syllabus is at fault, it is a taster’s responsibility to make clear there’s a change needed.

Diagnostic tests are used to identify student’s strengths and weaknesses.. They are intended primarily to ascertain what further teaching is necessary. Placement tests are used to assign students to classes of different levels and are instruments of differentiation in language training. They are tailor-made, that is produced by the school itself.

As to the content of the procedure, tests may direct and indirect. Testing is said to be direct when it requires the candidate to perform precisely the skill we wish to measure (tasks, texts should be as authentic as possible). Indirect testing attempts to measure the abilities which underlie the skills in which we are interested; e.g. method of testing pronunciation ability by paper and pencil test in which the candidate has to identify pairs of words which rhyme with each other. Indirect testing is superior to the direct in respect that the results are more generalizable. There might be further division into descrete point versus integrative testing. Descrete point testing happens when you test one element at a time, item by item (e.g. grammar tests); taking down a lecture, doing a cloze might be considered illustrations of integrative tests.

Testing may be norm and criterion referenced. A norm-referenced test relates the performance of one candidate to that of the other candidate(s). It is resorted to when evaluation is subjective (essays, reports, presentations) and one requires anchor papers (a collection of previous works which are classical within this or that performance level) to assess more adequately. The purpose of criterion-referenced tests is to classify people as to whether they are able or not to perform some task for a certain number of scores.

Tests may be objective and subjective. Objective tests require no judgement on the part of the scorer; they may be checked with keys – even by a non-expert. Subjective tests require some impressionistic judgement and usually entail some creative work. Consequently scoring may be holistic and analytic. Holistic is scoring on the basis of overall impression; analytic scoring disposes of the problem of uneven development of subskills in individuals. The drawback is the time it takes. Also, concentration on different aspects may divert attention from the overall effect of the piece.

1. Key notions for testing:

The key notions for this sphere of methodology would be test validity, test reliability, consistency, practicality and feedback. Test validity is sum of a number of aspects: content validity, a specification of skills and structures it is meant to cover; the content should be dictated by what is important to test, not what is easy to test; criterion related validity, especially predictive; this concerns the degree to which a test can predict candidates’ future performance; if the test is valid, the scoring groups neither devalue students’ achievement nor make ‘mastery’ impossible.

Reliability is in evidence when the same candidate performs in a similar way notwithstanding the day or the period passing between the test-work and the re-check. In other words it is permanence of measurement results produced by a test. Reliability coefficient is 1 when the same result is precisely obtained by a candidate twice. Reliability can be either checked by a test-retest method, whose drawback is low motivation of participants; or split half method, when they perform the other half of the test tasks or another variant. Consistency is agreement between parts of the test: all the task in a consistent test have the same degree of difficulty for the learners. Practicality has to do with time and effort invested into construction of the test and amount of time the test itself takes at the lesson. For progress tests, for instance, a double period is too much – such a test would be impractical.

Feedback can be either beneficial or harmful; it is beneficial when participants self-assess and have clearer learning objectives; it is harmful if there is a sense of frustration and failure, especially mass one.

Test techniques and testing overall ability.

a) Multiple choice (choosing between abc variants): it is objective and is easy to implement and score; but this technique test only recognition level; 1/3 of it on avarage is guessing; it restricts whatever is being tested; destractors are either not available or harmful – so feedback might be harmful as well. Distractors are variants which are wrong: they always present a problem since the candidate may resort to content clues; redundancy in the structure might slow him down; the testor might fall in love with philosophy and present items which cannot be regarded as either right or wrong; some distractors might present mixed items e.g. lexis instead of grammar; and finally most distractors are entirely ungrammatical and might lead to fossilization of mistakes. What is more often than not checked is general test-wiseness – experience in doing similar tasks and using all the possible clues – deduction but not precisely command of the language. An alternative to multiple choice is an open-end test which has a unique correct one-word response which the candidates produce themselves.

b) Cloze- is omission of every 7th or 9th word in a text for students to reconstruct it. It is an economic procedure, but not entirely objective, since students might provide a number of contextual variants. Educated native speakers also vary as to performance – some actually do worse with prediction than non-native ones. A conversational cloze might be considered a variarion – here students have to substitute missing utterances in a dialogue.

c) C-test –omission of every second part of every second word in a sentence. It is quick but limited in its application, since it checks anticipation and orthography mostly.

d) Dictation testing was for a time considered totally misguided (What does it test? Vocabulary? Grammar?) It turned out, however, that dictation tests results where similar to those obtained through other techniques: it is a good instrument to test ability or strength, but rather problematic in pin-pointing the particular student problems or weaknesses. Partial dictation has the merit of being less time-consuming (there is a printed handout with some parts missing for students to fill them in).

e) Gap-filling;

f) Matching; while fairly effective and quickly implemented this technique entails a lot of guessing; the last pair will be correct by default.

g) Sequencing;

h) Information transfer;

i) Editing (identifying mistakes); the danger here is not to encourage memorization of the wrong variant.

nor make ‘mastery’ impossible.

Reliability is in evidence when the same candidate performs in a similar way notwithstanding the day or the period passing between the test-work and the re-check. In other words it is permanence of measurement results produced by a test. Reliability coefficient is 1 when the same result is precisely obtained by a candidate twice. Reliability can be either checked by a test-retest method, whose drawback is low motivation of participants; or split half method, when they perform the other half of the test tasks or another variant. Consistency is agreement between parts of the test: all the task in a consistent test have the same degree of difficulty for the learners. Practicality has to do with time and effort invested into construction of the test and amount of time the test itself takes at the lesson. For progress tests, for instance, a double period is too much – such a test would be impractical.

Feedback can be either beneficial or harmful; it is beneficial when participants self-assess and have clearer learning objectives; it is harmful if there is a sense of frustration and failure, especially mass one.

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]