Book Summary

 

Planning and Implementing Assessment

BUY THIS BOOK FROM BARNES AND NOBLE

 

Freeman, Richard, & Lewis, Roger (1998).  Planning and implementing assessment.  London:  Kogan-Page.

 

WHAT IS ASSESSMENT?

The word "assessment" comes from the Latin “ad sedere”, meaning “to sit down beside.”  Actually, the “sit beside” language arose less from the friendly sound of mentoring someone and more from the sense of a legal representative in court sitting beside a person---500 years or so ago, an assessor was a person who advised judges on technical points (mostly having to do with fines and taxes).

Other Meanings:

bullet

fix amount of fine or tax

bullet

impose fine or tax

bullet

estimate value (e.g., home)

bullet

estimate the worth of, judge, or evaluate

Educational Purposes:

1) select

2) certificate

3) describe

4) aid learning

5) improve teaching

These could be separated into two main dimensions:  development and judgment.

 

Distinction

In the UK, assessment is considered to be separate from evaluation.  Assessment focuses on student learning, whereas evaluation focuses on how the various components of a course (e.g., syllabus, teacher) perform.  Assessment results can be USED for evaluation, but do not themselves constitute evaluation.

 

Two Rules of Thumb:

1) assess behaviors representative of required performance

2) use a  sufficient sample of behavior

 

THREE TYPES OF ASSESSMENT

I. Norm-Referenced: establishes a rank order of Ss in terms of achievement; that is, each S is assessed relative to others in a given group (e.g., year of school).  Most properly used for selection. Performs the precaution that selected Ss are above a minimum standard of competency.

Problematic in that it doesn’t measure against a common standard but rather against a cohort.  Therefore, for example, a person who falls below the cutoff at School A (and is thus not selected) might fall above the cutoff for selection at another school. 

 

II. Criterion-Referenced: measure Ss performance in relation to an explicit, previously determined standard (for example, a driving exam).   Good CRAs first chooses reasonable standards, makes those standards available publicly,  and then tests according to the standards.   They are problematic to the extent that any of these three things are not done.

 

III. Ipsative (Self-referenced): Ss performance is compared to their own previous performance rather than objective standards or the performance of others.  Students may also set their own learning objectives.  Problematic if a student advances relative to his/her own past performance but still falls short of competency.

Note:  these types are not mutually exclusive---you can use them in combination.

 

RELIABILITY

Two Types:  within one instructor’s ratings and among different instructors.

Ways to Increase Reliability: 

bulletpublish specific performance criteria, ensure they’re understood by everyone involved, and adhere to them
bulletget a bigger sample of the behavior (e.g., more questions on exams)
bulletget samples of a bigger variety of behavior (e.g., assessment portfolios)
bulletadjust grades (e.g., curve by removing poor questions and/or by comparing among assessors)
bulletredundancy: have assignments scored by more than one grader

 

VALIDITY:

 

Quote from H.G. Wells: “The only results we produced were examination results which merely looked like the real thing.  In the true spirit of an age of individualistic cooperation, we were selling wooden nutmegs or umbrellas that wouldn’t open, or brass sovereigns or a patent food without any nourishment in it.”

 

Improving validity:

bulletexplain why you do what you do in regards to assessment
bulletassess important rather than trivial outcomes (even if they’re harder to measure)
bulletuse appropriate methods of assessment for a given behavior (even if you have to devise them!)
bulletmake your assessment activities interesting to motivate students
bulletassess what you actually cover in your classes

 

OTHER CRITERIA

In addition to reliability and validity, consider:

bulletAuthenticity:  was it actually produced by the student?
bulletCurrency: is the evidence from a recent performance?  Often, we assess once and merely assume the assessment is valid for all time (as opposed to periodic re-certification).
bulletUtility:  is the assessment affordable, convenient, and flexible?  We always compromise:  e.g., driving tests would be better if we held them both during the day and a night, in cars and in trucks, etc.

 

MODES OF ASSESSMENT

FORMAL VS. INFORMAL

bulletFormal: structured events (e.g., exams, presentations)
bulletInformal: casual without preplanning or preplanned without counting for credit.

 

FORMATIVE VS. SUMMATIVE

bulletFormative: provide feedback for improving a process.
bulletSummative: counts towards a final grade or certification.

 

FINAL VS. CONTINUOUS

bulletFinal: taking place only at the end of a course
bulletContinuous: taking place throughout a course

 

PRODUCT VS. PROCESS

bulletProduct: focuses on end results
bulletProcess: focuses on the manner in which end results are achieved

 

SOURCES OF ASSESSMENT DATA

bulletStudents
bulletStudents’ Peers
bulletTutors and Graders
bulletInstructor

 

 

Submit a Summary!

If you've written summaries or reviews of books on teaching and learning, we'll include them here and credit you.  You can email them to us at  cogsim@cogsim.com.

© 1999-2001, CogSim