Kia Karavas
FAIRNESS AND ETHICAL LANGUAGE TESTING: THE
CASE OF THE KPG
Abstract
This paper focuses on the
issues of fairness and ethics in language
testing – issues that have recently become
priorities in the language testing agenda –
and how the KPG exams deal with issues of
fairness and ethics in the process of their
development and implementation. After
discussing the origins of ethical and
critical language testing and their
implications for the design and development
of language tests, a rationale for the
development of codes of ethics and codes of
practice is provided. Two examples of
internationally known codes of ethics and
practice (ILTA and ALTE) are presented. It
is argued that there are inherent
limitations with the applicability and local
relevance of internationally developed codes
of ethics or practice. These limitations
mainly relate to the nature of what is
considered ethical, and how absolute
‘ethicality’ can be, and to the
enforceability of codes. Given the nature of
these limitations, the need for local
examination systems to develop their own
code of ethics based on universalist testing
principles relating to test quality,
validity and reliability, public
accountability and responsible
professionalism is highlighted. Finally, a
detailed presentation of the code of ethics
adopted by the KPG exam system and how the
KPG exams adhere to principles of
accountability, professionalism and
transparency is provided. [1]
Keywords:
ethical language
testing, critical language testing, Codes of
Ethics, Codes of Practice, universalist
testing principles.
1. Ethical
issues in language testing: When did it all
begin?
Concern with fairness and
ethics in testing is not a recent
development in the sphere of educational
assessment. In North America, concern with
the development of testing standards
began in the 1960s, because of litigation
cases showing the differential treatment of
African-American children as a result of
their test scores. In fact, Fulcher and
Bamford (1996) discuss examples of court
cases arguing test bias against various
ethnic groups. In the field of language
testing, the issue of ethics began to slowly
surface in the early 1980s with the works of
Spolsky (1981) and Stevenson (1981), who
highlighted the political purposes and
consequences of tests, and with Messick’s
(1989) influential expanded theory of
validity, which explicitly linked ethics and
validity. Messick’s framework of validity,
which has become a reference for
discussions, research, and practice in
educational measurement, incorporates the
issue of ethics into the concept of
construct validity. Messick distinguished
between ‘the consequential basis for test
interpretation’ (i.e. what is commonly
understood as the washback effect) and ‘the
consequential basis for test use,’ which he
described as the impact of the use (and
misuse) of tests that had significant
consequences for test takers and society.
According to his expanded framework of
validity, the construct validity of score
interpretations forms the basis upon which
considerations of values, uses and
consequences of tests are based (see
McNamara, 2001; Bachman, 2000 for further
discussion).
However, it was only in the
1990s that the issues of fairness, ethical
language testing, and accountability became
priorities in the agendas of professional
testers and testing bodies. The 1990s
represented a ‘watershed’ in language
testing (Douglas, 1995) since it was during
this decade that the language testing field
witnessed both: a) a rapid growth of
discussion and writings amongst language
testers and b) increased sensitivity and
awareness by testing bodies about the role,
impact and consequences of tests on
candidates, on education, and on society
(Hamp-Lyons, 2000). The surge of interest in
fairness and ethics in assessment has also
been spurred by our increased awareness of
language use as a socially purposeful
activity, and of testing as an institutional
practice which serves various policy agendas
and can act as a lever for social and
educational reforms.
Interest in the social
character of educational assessment and its
consequences has also been accompanied by an
expanded concern for professionalism in the
field. This concern reflects the need to
define professional conduct in language
testing and, according to McNamara and
Roever (2006: 137) ‘ … is also a sign of
language testing coming into its own as a
profession rather than being a subordinate
component of language teaching or general
psychometrics.’
Professionalism is closely
linked to ethics since, in essence, being
professional entails abiding by the
principles of ethical conduct as laid down
by your profession. As House claims (1990:
91 cited in Boyd and Davies 2002: 302),
‘ethics are the rules or standards of right
conduct or practice, especially the
standards of a profession.’ In many
professions (such as law, medicine and
psychology), codifications of ethical and
professional conduct are published as ‘codes
of ethics,’ ‘codes of practice’ or
‘standards;’ these codes (phrased as
statements or guidelines) tend to elaborate
the moral responsibilities of members of the
profession. Admission to the profession and
practice depends on adherence to an ethical
code which, if violated, results in
sanctions for the members of the profession
(see McNamara and Roever, 2006).
Shohamy (2001b) extends the
concept of professionalism in language
testing and argues for the need of
responsible professionalism which
entails both: a) sharing authority among,
collaborating with, and involving different
stakeholders (including test takers) in the
test development process, and b) striving to
meet the various criteria for validity. One
way of sharing authority, according to Boyd
and Davies (2002), is to agree and publish a
code of ethics, a practice which many
language testing organizations are
increasingly following in the recent years.
But why is a code of ethics necessary in
language testing?
2. The power of tests and the need for codes of
ethics and practices
The power that tests have is
discussed eloquently by Shohamy (2001a:
374). In her words, ‘Tests play a major role
as their results have far-reaching
implications for individuals and educational
systems; tests can create winners and
losers, successes and failures, the rejected
and the accepted. Yet, it is often
performance on a single test, on a single
occasion, at one point in time that leads to
the irreversible and far-reaching high
stakes decisions.’
In understanding the power of
tests, Bachman (2000) is one more testing
theorist who agrees that tests are not
value-free, culturally neutral tools
developed in psychometric test tube
environments. They are, in fact, powerful
tools with far-reaching and sometimes
detrimental consequences for individuals and
society at large. Test scores, for one
thing, can be used for admission to
universities or to a profession, as well as
for employment decisions and grants of
citizenship. Bourdieu (1991 cited in
Shohamy, 2001a) highlights the symbolic
power of tests, arguing that they can be
used as rites of passage, as a means of
socializing the public, and creating a
dependency of test takers on tests as a main
criterion of worth. While tests represent
‘cultural capital,’ the knowledge contained
in tests, on the other hand, represents in
the minds of test takers and users what
counts as legitimate, worthwhile knowledge,
and, thus, tests can provide the means for
controlling knowledge.
Furthermore, as Shohamy
(2000, 2001a, b) maintains, tests can be
introduced in unethical ways for
disciplinary purposes and for carrying out
various policy agendas perpetuating the
domination of those in power and excluding
unwanted groups. In her 2000 article, she
provides various examples of the unethical
and undemocratic uses of tests for excluding
groups from various cultural and linguistic
backgrounds. Shohamy, in fact, is one of the
ardent proponents and instigators of
critical language testing. In her 1997
plenary address to the American Association
of Applied Linguistics, she essentially
defined the critical perspective within the
field of language testing, while in her 1999
article, she identified 15 democratic
principles that should underline language
testing. What Shohamy had defined as
critical language testing has actually grown
into a powerful movement in the USA that
critiques tests and testing practices (see
www.fairtest.org).
Critical language testing has
certainly made an important contribution in
terms of both: a) alerting test developers
and administrators to the impact and
consequences of the decisions they make on
the basis of test scores, and b) raising the
awareness of test developers about the power
they hold (Hamp-Lyons, 2000). Critical
language testing has given rise to constant
questioning of the normative assumptions of
language testing and applied linguistics in
general, and has called attention to the
political uses and potential abuses of
language tests (Lynch, 2001; Bachman, 2000).
It has clearly articulated and highlighted
the need for language testers to become
aware of the principles of and strictures on
the use of the power that they hold, as well
as the need for testers and candidates to
critically analyse the ways in which tests
are used, and to ensure that these uses are
fair and ethical (McNamara, 1999 in
Hamp-Lyons, 2000).
However, it should be pointed
out that proponents of critical language
testing engage in constant questioning in an
attempt to address issues of political
oppression and social domination as regards
the use of high stakes exams which may
result in excluding groups of individuals
from society on the basis of their ethnic or
cultural background. Much of the critique of
critical language testers relating to the
unethical and unintended uses of tests has
also been made with reference to
international language testing organizations
which attempt to develop tests of the ‘one
fits all’ type. That is, tests which purport
to be relevant, valid and applicable to all
contexts and for all test users regardless
of their educational, socioeconomic and
cultural background. The very act of
developing one-fit-all tests and of adopting
international exams in situations where the
stakes are high, or doing the exact
opposite, is based on political decisions
and is, therefore, a political act (cf.
Shohamy 1993, cited in Bachman, 2000).
Educational assessment and
language testing in particular is certainly
a ‘socially embedded activity’ (Stobart,
2005) which should be analysed and critiqued
with reference to the cultural and political
context in which it is developed and within
which it operates. It is undoubtedly the
responsibility of every test developer to
become aware of the potential misuses of
tests, their intended and unintended
consequences on the lives of the candidates,
and to ensure that the procedures followed
in the development of tests reflect ethical
practice and lead to fair competition.
Fairness, though, is not
integral to the notion of validation of test
instruments, as was once believed. A valid
test does not necessarily guarantee that the
test is fair. Fairness does not only relate
to the technical aspects of test
development, but to the wider social context
in which test development, implementation
and administration is undertaken. As Kunnan
(1999), states this type of fairness is left
untouched by concerns of validation and
relates to ‘how instruments and their score
interpretations work in society’ (ibid:
238).
Certainly, one way of
addressing and overcoming the potential
misuses of tests and ensuring that tests are
fair not only on the test level but also in
the broader context of their use is through
the development of Codes of Ethics. Codes or
Standards (a term mainly employed in the US
educational context) form the basis for
monitoring and evaluating testing practices.
According to Alderson, Clapham and Wall
(1995: 236), codes are in essence ‘an agreed
set of guidelines which should be consulted
and, as far as possible, heeded to in the
construction and evaluation of a test.’
Codes of Ethics are sets of
principles or guidelines that ‘draw on moral
philosophy and are intended to guide good
professional conduct’ (Boyd and Davies,
2002: 304). They are based on concepts of
what constitutes moral behaviour, and
represent a one-sided contract between the
profession and all other interested
stakeholders, i.e. a statement of the
profession’s own moral position. This
statement is intended to assure the public
that the profession subscribes to high
ethical standards, and that the tests it
develops adhere to clearly articulated
principles rendering them valid and fair.
Codes also aspire to guide
professionals in their decision-making, and
‘serve a self-policing function [since]
members who violate them are sanctioned’
(McNamara and Roever, 2006:138). Codes of
ethics, as Boyd and Davies (2002) remind us,
are regulative ideals, not prescriptions or
instructions on how to act in every
circumstance. They are neither a stature nor
a regulation, but offer ‘a benchmark of
satisfactory ethical behaviour for all those
in the profession’ (ibid: 306).
In language testing, as in
other professions, a common practice is to
develop two codes: a code of ethics, and a
code of practice. The former focuses on the
morals and ideals of the profession, while
the latter instantiates the principles of
the code of ethics and provides more
specific guidelines and requirements for
practice which clarify the behaviours that
lead to professional misconduct or
unprofessional conduct. Some organizations
combine general ethical principles and
guidelines for practice in one code (e.g.
the American Psychological Association
Guidelines).
3. Examples of codes of ethics and practice in
language testing
In the area of educational
measurement, various large-scale testing
bodies and organizations have developed
codes of ethics and practice (or combined
codes) in an effort to uphold standards in
the profession and to ensure that tests are
fair, valid and ethically used. Some of the
most well-established and internationally
known codes include:
-
The Standards for Educational
and Psychological Testing (1985), also known as ‘APA
Standards’, these were developed in the US
by the American Educational Research
Association (AERA), The American
Psychological Association (APA) and the
National Council on Measurement in Education
(NCME).
-
The Code of Fair Testing
Practices in Education (1988).
A set of standards based on the APA
standards. These are more focused on
refining many of the principles of the APA
in order to improve testing practices. This
code was developed by the Joint Committee on
Testing Practices, comprised of members of
the AERA, APA, MCME, the Canadian
Psychological Association, and 23 test
publishers.
-
The ETS Standards for Quality
and Fairness (1987).
These were developed by the Educational
Testing Service in the US and evaluated by
the APA Standards Evaluation Committee.
-
Standards for Educational
Testing Methods (1986).
These were adapted from the Standards for
Evaluation of Educational Programs, Projects
and Materials (1981) by Nevo and Shohamy
(1986).
-
The SEAC’s Mandatory Code of
Practice (1993). This was developed by the
Schools Examination and Assessment Council
(SEAC) in the UK in order to monitor
procedures in examination development in the
context of the National Curriculum in
England and Wales.
The nature and principles of
all the above standards and codes, developed
within the wider context of educational
testing in the US and the UK, are not of
immediate concern in this paper,[2]
which focuses on language testing in
particular. However, it is useful to briefly
present and discuss the main principles of
two of the most well- known internationally
used codes of ethics and practice for
language testing, the ILTA Code and the
ALTE Code.
As will become evident from
the descriptions below, although both codes
are addressed to language testing
organizations and individual language
testers, there are significant differences
in the nature and content of the Codes and
the level of detail in which the principles
are explained. Thus, the ILTA Code is
essentially a code of ethics, while ALTE has
opted for a code of practice. The former is
addressed to international language testing
organizations and individual language
testers, while the latter is intended for
European language testing bodies.
ILTA’s Code stipulates nine
ethical principles which draw on moral
philosophy and which reflect the ideals of
the profession. These principles focus on
the language testers’ moral obligations
towards the candidates, the profession, and
the wider society in which tests are used.
Principles are phrased as guidelines, and
each is broken down into a series of
statements which elucidate the test
developers’ moral obligations. The ALTE Code
identifies the responsibilities of test
developers and test users (i.e. bodies which
select/commission examinations and make
decisions on the basis of test results);
these responsibilities, phrased as
imperatives, define exactly the actions test
developers/users must take in order to
ensure good practice and to uphold the
standards of the profession.
3.1 The ILTA Codes of Ethics
and Practice
This Code was developed by
the International Language Testing
Association and consists of a Code of Ethics
(2000) adopted at the annual meeting of ILTA
in Vancouver, March 2000, and a draft Code
of Practice (2005). The ILTA Code is
addressed to individual language testers and
testing institutions, and offers a benchmark
of satisfactory ethical behaviour for all
language testers. According to the document,
it ‘is based on a blend of the principles of
beneficence, non-maleficience, justice, a
respect for autonomy and for civil society.’
The ILTA Code identifies nine fundamental
Principles, each elaborated and explained
through Annotations. The Annotations also
elaborate on the Code’s sanctions,
highlighting the fact that failure to uphold
the Code may result in penalties (i.e.
withdrawal from ILTA membership).
The first three principles of
the Code focus on the protection of the
candidate and the relationship between
tester and candidate i.e., a) testers must
not discriminate against candidates or abuse
the power they hold over them, b) testers
must respect the confidentiality of
candidates’ personal information, and c)
testers must abide by ethical principles
when conducting test-related research that
involves candidates.
The following six principles
relate to the language tester’s
responsibilities towards the profession and
society at large. Test designers: a) must
maintain and update their skills, b) must
train others in principles of good practice,
c) must uphold the integrity of the language
testing profession, d) must improve the
quality of the language testing profession,
e) must make it widely available to others,
and f) may withhold their professional
services on the grounds of conscience when
tests have negative effects or repercussions
on all stakeholders.
In order to illustrate the nature of
ethical principles contained in the ILTA
Code of Ethics, an extract from the Code is
included in Appendix 1.
The ILTA Code of Practice
deals mainly with test design,
administration and score interpretation. The
code also lists ten rights of test takers,
which include their right to be informed
about the test and its consequences and the
right to be treated fairly, as well as ten
responsibilities of test takers, which
mainly relate to their responsibility to
actively seek information about the test and
to be cooperative during test
administration.[3]
3.2 The ALTE Code of Practice
The ALTE Code of Practice has
been developed by the Association of
Language Testers in Europe, an Association
formed in 1990 by representatives of eight
institutions. ALTE now comprises of 31
institutions representing 26 European
languages.
This Code is closely modelled
on the JCTP Code mentioned above, and is
divided into two parts: Part 1 lists the
responsibilities of test developers, who are
ALTE members, while Part 2 lists the
responsibilities of test users or, more
precisely, score users (i.e. those who make
decisions which affect the educational
possibilities and careers of others on the
basis of examination results). The
responsibilities of ALTE members relate to
the development of examinations, the
interpretation of exam results, the need to
strive for fairness, and the need to inform
candidates (see Appendix 2 for the list of
responsibilities of test developers
identified in the ALTE Code of Practice,
Part 1). Responsibilities of score users
include guidelines for the selection of
appropriate examinations, the correct
interpretation of scores, striving for
fairness, and informing candidates. In a
short document of only three printed pages,
the responsibilities are expressed as brief
statements about what each group is expected
to do in relation to the categories
mentioned. This Code has since been revised
and is now part of a larger quality
management process.
ALTE has also published a
brief list of Minimum Standards for
establishing quality profiles in ALTE
examinations, which comprises 17 minimum
standards grouped under: test construction,
administration and logistics, marking and
grading, test analysis, and communication
with stakeholders (see Appendix 3 for the
Minimum Standards)[4].
4. The dangers and
limitations of codes
There seem to be two main
limitations with the development and
implementation of codes of ethics and
practice. One limitation relates to the
question of what ethical is and the
second relates to the enforceability of
codes of ethics.
Certainly, codes of ethics
reflect the profession’s concern with the
greater good and its desire to be moral and
just. However, what is considered ‘ethical’
or ‘moral’ in one context is not necessarily
considered so in another. Different
cultures clearly have different conceptions
of morality and acceptable behaviour.
Morality can never be absolute, and ethical
principles cannot be universally applicable.
Members of the ILTA
organization faced great problems in
reaching agreement on a common set of
ethical principles and practices that are
applicable across a range of different
political and cultural contexts (Boyd and
Davies, 2006). As a result, the principles
are phrased in broad and general terms and
are susceptible to various interpretations,
while the annotations to these offer no room
for negotiation.
In essence, codes of ethics
reflect a top-down approach to ‘quality
assurance’ which cannot by nature take into
account and cater for all contexts, local
differences and exigencies, and which cannot
accommodate all possible circumstances and
situations worldwide. An example will make
this limitation of codes of ethics clear.
The fourth bullet of the
Annotation to Principle 7 of the ILTA Code
of Ethics states: ‘Language testers shall be
prepared to work with advisory, statutory,
voluntary and commercial bodies that have a
role in the provision of language testing
services.’ Taking the KPG test developers as
an example: The director of the test
development team, who is a public servant
(professorial staff of a state university),
is appointed by the Ministry of Education;
thus, s/he must abide by the rules and
regulations concerning the professional
conduct of public servants in the Greek
context. By law, members of the test
development team are not allowed to
cooperate and work with any commercial body
involved in education or language testing
for commercial or symbolic profit. In this
case, the local code of ethics which
determines the team’s professional status in
Greece clashes with a principle of the code
of ethics for language testers worldwide.
A second example: The third
bullet of the Annotation to Principle 2 (see
Appendix 1) states: ‘Similarly, in
appropriate cases, the language tester’s
professional colleagues also have the right
to access data of candidates other than
their own in order to improve the service
the profession offers. In such cases, those
given access to data should agree to
maintain confidentiality.’ Taking the KPG
test development team (and the ethical
framework within which they are obliged to
operate in Greece) as an example once again,
it should be pointed out that the
project team is not allowed to use, access
or share data regarding test takers since
this data is strictly confidential and is
protected by the Ministry of Education.
Thus, if another testing body requests
access to KPG test taker data, in abiding by
the local ethical code, it will not be
possible to make such data available.
Naturally, this does not mean that the KPG
project team is ‘unethical’ or that its
members do not wish to contribute and ‘to
improve the service the profession offers.’
Indeed, codes of ethics,
developed almost in their entirety in
Western contexts, can be accused of
promoting and imposing hegemonic ethics.
Some authors (Boyd and Davies, ibid,
MacNamara and Roever, 2006; Bachman, 2000;
Bachman and Palmer, 1996) have highlighted
the need for the development of local codes
of ethics (i.e. codes developed by local
testing professionals that adopt
universalist testing principles with a local
gloss) in order to overcome the problem of
cultural relativism. This solution, though,
is fraught with other difficulties. In many
local contexts, there is no critical mass of
professional language testers, which is
needed to develop an examination body that
could issue such codes. Therefore, a number
of important questions such as the following
are raised: If local codes are developed,
who will oversee and evaluate the validity
and usefulness of these codes? How will
enforceability of these codes be guaranteed?
Who will issue sanctions in cases where
local codes are not followed?
Another problem with codes of
ethics relate to their implementation and
enforceability. Boyd and Davies (2002: 305)
make an eloquent point about the potential
inherent hypocrisy of codes by stating
‘Codes can be viewed in two opposing ways:
positively they represent a noble attempt by
a profession or institution to declare
publicly what it stands for; negatively,
they can be seen as a sophisticated
pretence, a vacuous statement of pretend
ideals behind which the profession feels
even safer than before, now that it can lay
claim to a set of unimpeachable principles.
And if accountability remains in house, then
there is no check on adherence to the
principles.’
Unlike other professions
(e.g. medicine and law), in ‘weak’
professions like language testing, there is
no external organization that grants
permission and the right to practice, and
there are no serious sanctions against
individuals or institutions that violate or
do not follow codes of ethics (McNamara and
Roever, 2006). Indeed, there are many
well-known and respected testing bodies and
organizations which do not follow codes of
ethics and practice (see examples of such
cases with known UK examination boards in
Fulcher and Bamford, 1996). Alderson,
Clapham and Wall (1995) conducted a survey
on 12 examination boards in the UK in order
to illustrate current practice in language
testing, and they found that testing
practices vary considerably in practices
such as:
-
item pretesting, (According to the
researchers, there was widespread absence of
pre-testing in some cases but not in
others.)
-
test validation, (Some boards lacked any
evidence whatsoever on the empirical
validation of their tests but not others.)
-
procedures ensuring the equivalence of
different versions of exams,
-
training and monitoring of administrators,
-
double marking of all scripts (In rare cases
did the researchers find that this took
place),
-
availability of data on reliability of the
exams and of marking. (Reliability amongst
most boards was asserted rather than
measured)
The results of the survey led
the researchers to conclude that information
about the exams is not readily and publicly
available. They reported that it took a
great deal of effort and time on their part
to get as far as they did. ‘This should not
be necessary,’ they said. ‘If evidence is
available to support claims of test quality,
that evidence should be made publicly
available,’ and they continue ‘ … it appears
that the different boards involved in EFL
testing do different things, with differing
degrees of rigour, to monitor the quality of
their examinations’ (Alderson, Clapham and
Wall ibid: 258).
Given the limitations of
codes of ethics discussed above, the way
forward, I believe, is for professional
language testers and language testing
bodies, especially those working in local
contexts, to adopt an approach which centres
on the following three principles,
principles which are common to all codes of
ethics:
a) adherence to universalist principles
relating to test quality, validity and
reliability
b)
public accountability which entails
transparency and openness about design,
constructs, procedures, and scoring to all
stakeholders
c)
responsible professionalism, as defined by
Shohamy (2001b), which requires shared
authority, collaboration and involvement of
stakeholders (including candidates).
In addition, I believe that
any large-scale test is susceptible to
practical, financial and political factors
and local contingencies; compromises between
these factors and the test design process
are many times inevitable but, as Alderson,
Clapham and Wall (1995) argue these
compromises need to be as principled as
possible. True ‘fairness’ in language
testing can never be achieved since fairness
is a relative, qualitative concept
reflecting a value judgment. What language
testers should strive for is ‘equitability,’
and I fully agree with Spaan (2000: 35), who
argues that ‘this equitability implies joint
responsibility of the test developer, test
user and the examinee in a sort of social
contract in which the developers promise to
maximize validity, reliability and
practicality of application. It is the
developer’s responsibility to educate the
users by providing readable and
understandable interpretations and
guidelines for use of their products,
including the provision of norms and scoring
criteria. The developer must be able to
justify any claims about the test and
furthermore the developer must also solicit
feedback from users and examinees.’
5. Accountability, professionalism and
equitability in the KPG exams
The KPG examination system
does not follow a code of ethics or code of
practice developed by an external
international organization, but it has
seriously considered international codes of
ethics and practices in order to develop the
‘glocal’ code of ethics and practices. Since
2003, when the exams in four languages were
administered, the language project teams
have had to follow the principles of
accountability, professionalism and
equitability developed by the Central
Examination Board. These principles are
realized through the exam specifications and
the Ministry-stipulated regulations, which
are publicly available and common to all
languages, concerning the test development
process and exam administration. This code
of ethics offers a local gloss on global
language testing principles (hence ‘glocal’
in nature), relating to quality and ethical
language testing. It places emphasis on
establishing and ensuring validity and
reliability of the exams, assessing the
quality of the exams, and involving
stakeholders in the development of the exam
(principles not elaborated upon in the ALTE
Code).
How the KPG exams in English
follow and materialize these principles is
described in Section 5.1 below. Where
appropriate, reference to KPG’s adherence to
guidelines for good practice as presented in
the Minimum standards for establishing
quality profiles in ALTE examinations
(Appendix 3) will be made. As will become
evident in the presentation below, the KPG
exams meet all the minimum standards, which
are denoted by a number after each item
(e.g. MS:1, MS:5, etc.
see Appendix 3) and fulfil many more not
included in the ALTE profile. As will be
demonstrated, through the established and
commonly agreed upon procedures for the
development of the KPG exams, the principles
of accountability, professionalism and
equitability have become cornerstones of the
KPG examination system.
5.1 The KPG test development
process
5.1.1 Test purpose and
specifications
The National Foreign Language
Exam System (Kratiko Pistopiitiko
Glossomathias-KPG) aims at measuring levels
of competence or proficiency in English,
French, German, Italian, Spanish and Turkish.
The overall aim of the KPG
exams at all proficiency levels is to test
the candidates’ ability to make purposeful
use of the target language at home and
abroad. More specifically, depending on the
level of proficiency, the exams test
candidates’:
Exam specifications have
been developed by the Central Examination
Board, in collaboration with testing
professionals at Greek universities, and are
common for all languages assessed through
the KPG exams.
The Central Examination Board
(henceforth CEB) is responsible for
approving the test paper content before it
is disseminated to the state-selected exam
centres and it is also responsible for the
specifications regarding the exam format,
the structure and the scoring regulations.
Additionally,
the CEB functions in an
expert consulting capacity, advising the
Ministry on matters regarding the
development and growth of the system, exam
policies and law amendments, new and revised
regulations.
The specifications
developed
by the CEB for each
exam
level include information on:
-
the aims and nature of the
exams (i.e. the
theory of language and
principles permeating the tasks, selection
of materials, criteria for scoring of all
test papers of the KPG exam).
(MS:1)
-
candidates’ profile
(i.e., characteristics of
candidates to whom
the different level exams are
addressed). (MS:2)
-
the structure of the
exam and the uses of language
that each test paper (module) aims to
assess.
-
descriptors of communicative
performance expressed as can-do statements
based on the CEFR. (MS:5)
-
degree of difficulty
of the level
of the exam.
-
distribution of items
per test
paper.
-
duration of each test
paper and length of
texts therein.
-
weighting of marks
for each test
paper.
-
text types for
comprehension and
production.
-
a typology of
activity types
per test paper.
Exam specifications for each level are
accompanied by sample tests and answer keys.
These specifications are made public and are
easily accessible (see Section 5.3 below)
through the Ministry of Education and
Religious Affairs website (http://www.minedu.gov.gr/eksetaseis-main/kpg-main/)
and the RCeL website (http://www.rcel.enl.uoa.gr/)
(MS:16).
Moreover, rating scales and
criteria for the assessment of oral and
written language production (with detailed
descriptions) for all levels of the exams
have been developed. Rating scales are made
public to candidates and teachers (MS:16).
The detailed descriptions of assessment
criteria are made available to multipliers
(i.e. the trainers of oral examiners), oral
examiners, raters, and coordinators of the
marking centre.
5.1.2 Test design, item
writing and pre-testing
A preliminary needs analysis
survey involving employers, students and
teachers throughout Greece was carried out,
the results of which informed the overall
content of the exams. In the design of tasks
and the selection of materials (e.g. visuals
and multi-modal texts), the following
specific characteristics of candidates,
which are identified in the exam
specifications, are seriously taken into
account: linguistic background,
language-learning background, level of
literacy, age, and socio-cultural
background. These are systematically
investigated at every administration with
questionnaires distributed to candidates.
(MS:2)
Clearly laid-out guidelines
for the design of tasks and the selection of
materials for each test paper and level of
the exam have been published for item
writers. These guidelines are accompanied by
samples of appropriate tasks and materials
which have been calibrated to CEFR level
descriptions. (MS:3, MS:5). Apart
from the item writers,
there are also two Test Developers (who can
be University Professors, School Advisors or
highly qualified teachers) per language
appointed by a Ministerial decree for two
years. The foreign language departments of
the University of Athens and the University
of Thessaloniki and their specialist
language groups are responsible for the
development of test tasks in the languages
that the KPG exam battery offers. More
specifically, the University of Athens is
responsible for developing test tasks in the
following languages: English, German,
Spanish and Turkish. The University of
Thessaloniki is responsible for the French
and the Italian language respectively. Each
department has to also select an Academic
Director, who is responsible for: a) the
rating process and b) the training of the
raters and the examiners.
The item writers and Test
Developers are experienced language teaching
professionals who receive training and
feedback by the specialist language groups.
Moreover, there are systematic procedures
for review, revision and editing of items
and tasks to ensure that they match test
specifications and comply with item writer
guidelines (MS:3). All items, tasks
and task instructions designed by item
writers are evaluated systematically (in
terms of their adherence to the illustrative
descriptors and can-do statements, the test
specifications, and their clarity) by
specialist language groups comprising of
assessment specialists. Items and tasks are
returned with feedback from the teams to the
item writers for revision, and the process
continues until all items and tasks are
judged by the specialist teams as
satisfactory. They are then included in the
electronic item bank. Before each exam
administration, the test items for all test
papers are also evaluated by the Central
Examination Board. (MS:3) More
specifically, for the development of each
test paper of the exam, the following
procedure is followed:
1.
Initial version of test
paper: /span>
screening by inspector(s) for
approval
2.
Test trial run by test
development
team
3.
Test
revised
4.
Revised
test
paper piloting
5.
Test re-revised
to final version
6.
Evaluation of test paper
by (3) team experts
7.
Approval
of test paper by
the Central Examination Board
5.1.3
Validity and reliability
Exams are assessed a
priori and a posteriori for
reliability and validity through test item
analysis of piloted materials and actual
exam materials. The results are discussed by
all project team members, the test
development
team and the members of the Central
Examination Board. After each exam
administration,
content
and form features from level to level are
compared, test taker’s scores on different
test papers are compared, the difficulty of
each exam paper is investigated, test
takers’ scores on each test paper are
related to the final result, alongside
classical item analysis, test paper validity
is also investigated using the Rasch model.
(MS:4)
More specifically, systematic
item analysis for all test papers is carried
out by calculating item difficulty and item
discrimination. Moreover,
systematic
research involving candidates’ perceptions
of item and task difficulty on all test
papers of the exam is carried out during
each exam administration through specially
designed questionnaires completed by
candidates after the exam.[5]
(MS:13 and MS:14)
Moreover, reliability of
marking is monitored and assessed through
systematic inter-rater reliability checks
during and after each exam administration.
For the speaking test, inter-rater
reliability is assessed through the data
collected by observers who note the mark for
each candidate awarded by each observed
examiner. On the observation form, the
observers (who are multipliers, i.e.
experienced and trained trainers of oral
examiners) also note their mark of
candidates’ performance. This data is then
collected by the RCeL, where inter-rater
reliability estimates are calculated. For
the writing tests, coordinators of the
marking centre systematically monitor
markers’ application of the assessment
criteria and the marks assigned to scripts.
All scripts are routinely double marked.
Cases of discrepancy between markers of the
same scripts are noted, and markers are
alerted of the discrepancies (also see
Section 5.1.4 below). After the marking
process has been completed, both marks on
all candidate scripts are collected, and
inter-rater reliability estimates are calculated.[6]
Reliability of marking and
adherence to exam
guidelines
is also ensured
through the systematic training of
multipliers, oral examiners, marking centre
coordinators and markers.[7]
(MS:11 and MS:12)
5.1.4 Test administration and
monitoring
The KPG exams are
administered exclusively in Greece and are
governed by the Ministry of Education and
Religious Affairs, which, through its
Department of Foreign language Knowledge
Certification, is responsible for
administering the exams and issuing the
respective certificates. The Ministry uses
the infrastructure, technical and
administrative support of the university
national entrance exams. State schools
throughout the country serve as the official
examination centres for the KPG exams. The
same procedures followed for the
administration of the national university
exams are followed for the administration of
the KPG exams.
Exam centres operate
throughout Greece in every area where there
are 10 or more candidates. On average,
around 200 exam centres
are used for each
exam administration. There are also two
specially equipped centres for candidates
with learning disabilities and special
needs, one in Athens and one in
Thessaloniki. These two centres are staffed
with specially trained personnel and trained
and experienced oral examiners conduct the
speaking test. (MS:6, MS:7 and MS:10)
Each exam centre has an Exam
Centre Committee. The Ministry of Education,
in collaboration with language project
teams, have
developed and published guides for KPG exam
centre committees, detailing the rules and
regulations for the administration of the
exams. Committees are in direct contact with
the Ministry of Education during the exam
administration, reporting on any problems or
difficulties faced.(MS:6 and
MS:8)
After candidate scripts have
been rated, the results of the rating
process are collected by the Ministry’s
Department of sults
are then forwarded to the Specialist
Language Groups of every language, the
Central Examination Board and the Department
of Foreign Language Knowledge Certification.
The final
results are announced by the Ministry of
Education and Religious Affairs and they
appear in a table validated by the Minister
of Education, which includes the names of
all the successful candidates. Candidates
can be informed about their final scores by
the Department of Secondary Education, in
which they enrolled for the exams or via the
Internet (KPG’s website) by using their
individual access code. Candidates, who have
passed the exams, can receive their
certificates from the Departments of
Secondary Education in which they have
registered for the exams. (MS:9, MS:15)
Monitoring of the speaking test is carried
out through the KPG Observation Project. The
Observation Project was launched in November
2005, and has been on-going ever since, in
an attempt to identify whether and to what
extent examiners adhere to exam guidelines
and the suggested oral exam procedure. The
initial overall goals of the Observation
Project were to gain information about the
efficiency of the oral exam administration,
the efficiency of oral examiner conduct,
applicability of the oral assessment
criteria and inter-rater reliability.
(MS:11 and MS:12)
The observation forms, used
by trained
observers, elicit information on
the seating arrangements in the examination
room, procedure followed, the candidates’
age and sex, the choice of questions and
tasks made by the examiner/interlocutor,
ratings by the two examiners and the
observer, duration of the oral exam, time
allocation to different activities and to
the two candidates, and overall assessment
of the examiner’s oral performance as
‘Excellent-Very good’, ‘Good’ or
‘Mediocre-Poor.’[8] (MS:12)
Additionally, monitoring of
the marking process for the writing test
paper is carried out by trained coordinators
at the two rating centres (one in Athenstyle="line-height: 100%; font-family: Arial">
and
one in Thessaloniki) both of which are
chaired by the CEB.
In the Rating Centres, there are
co-coordinators of the script-raters that
monitor the rating process and guide the
script raters. There is also an
administrative committee and a secretariat
that comprises of employees of the
Ministry’s Department of Certification. The
Rating Centres gather the candidates’ answer
sheets for all the languages that they are
responsible for.
The trained coordinators at each marking
centre
are responsible for:
-
monitoring script raters’
individual performance during the rating of
a certain number of scripts (i.e. at least
three scripts in every packet of 25). This
procedure is followed each time raters are
obliged to move to the next level of the
exam.
-
monitoring the script raters’
application of the assessment criteria in
each of these scripts, and keeping records
of raters’ performance by filling in two
different statistical sheets. The first one
includes general comments, while the second
is more detailed and asks for the
coordinators’ justified evaluation of the
individual script raters.
-
monitoring the performance of
the raters through randomly chosen scripts
already marked, in which the raters are
asked to justify their assigned marks. The
coordinators discuss the application of the
marking grid, and they keep records of the
whole procedure. These records will be
analyzed after the rating period has ended,
and details of the raters’ actual
performance will be recorded and analyzed
for further reference and evaluation of the
individuals and the process itself. (MS:12)
5.1.5 Post-examination review
As mentioned above, item
analysis and inter-rater reliability are
calculated after each exam administration,
and the results of each analysis are
presented in the form of a report. Moreover,
the results of the observation forms are
analysed and presented in the form of a
report, as are the results of the monitoring
procedures of the marking process.
Additionally, the Ministry of National
Education issues statistics regarding the
number of candidates who took part in the
exam for each level for each
language (MS:13 and MS:14)
5.2 Involvement of
stakeholders
Relevant stakeholders have
been involved, directly and indirectly,
in
the development of the KPG examination
system in various ways:
-
Through the initial needs
analysis survey which involved parents,
students, employers and teachers, and the
results of which have informed the design of
the KPG exams.
-
Through candidate surveys
carried out after each exam administration.
Specially designed questionnaires in Greek
are completed by candidates at the end of
the exam to elicit their feedback on
perceived difficulty of tasks and items,
their familiarity with topics and text
types, and their opinion of the test as a
whole. Different questionnaires are
designed for different level candidates.
-
Through feedback forms
completed by oral examiners and script
raters after each exam administration. These
forms aim to elicit feedback concerning the
potential problems with test items, their
usefulness, appropriateness and
practicality.
-
Through feedback forms
completed by invigilators and staff of the
rating centre committees.
-
Through examination
preparation classes offered to University of
Athens students by the RCeL staff.
-
Through a special exam
preparation programme launched by the RCeL
in 2008. This programme involves a sample of
primary and secondary school teachers
developing, with the cooperation and help of
the staff at RCeL, test preparation
materials and teaching test preparation
courses in schools around the country. These
courses are offered free of charge to
students in after-school classes, through
the Support-Teaching State School Programme.
5.3 Accountability and
transparency
Transparency of procedures
and openness to stakeholders has been a
defining characteristic of the KPG
examination system. Conscious efforts have
been made throughout the years to
disseminate
information
about the nature and
requirements of the ts of the KPG exams, and to inform
and educate all relevant stakeholders. KPG
test specifications are written in Greek to
ensure comprehension by prospective
candidates, teachers and parents. These have
been made publicly available and easily
accessible through the Ministry of Education
and RCeL websites. Moreover, on the Ministry
of National Education website, relevant
stakeholders and interested individuals can
find information on (http://rcel.enl.uoa.gr/kpg/):
-
issues related to test
management and administration
-
the general framework and
principal features of the KPG examination
system and the published version, which has
been disseminated to relevant stakeholders)
-
the specifications for the
exams which are common for all languages
assessed sample exams and answer keys for
all levels and languages
Moreover, the Ministry of
Education has published leaflets to inform
the general public and relevant stakeholders
on the main features, aims, nature and
requirements of the KPG exams. Also, the
Ministry published and disseminated, free
of (MS:16 and
MS:17)
In addition, as mentioned above, candidates
and
teachers also have access to the rating
grids for the speaking and writing tests.
The RCeL has published a
comprehensive Handbook for the Speaking Test
(Karavas, 2009) and a Guide for KPG raters.
Both are
addressed (MS:16)
The launched a KPG information dissemination
programme in order to inform the public and
relevant stakeholders about the
characteristics and requirements of the KPG
exams. More specifically, multipliers and
school advisors, after being trained, were
given specially written materials (e.g.
PowerPoint presentations, leaflets, etc.)
for different audiences, and were
responsible for organizing ‘KPG information
seminars’ for teachers, parents, the general
public and language institute owners. These
seminars also included question and answer
sessions in order to clarify any queries or
misinterpretations the public had about the KPG exams. (MS:16)
A series of conferences have
also taken place in Athens and Thessaloniki
to inform professionals in the area of
applied linguistics and testing from Greece
and abroad about the research and
developments (MS:16)
Finally, professionalism and transparency is also
reflected in the systematic attempts made by
the KPG project team members to present KPG
research at international and local
conferences, as well as to publish books,
articles and commentaries about the KPG
exams. These publications appear in Greek
for teachers and language testing
professionals in Greece, and also in English
and other languages for the international
community (see
http://rcel.enl.uoa.gr/kpg/).
The emphasis on transparency,
accountability and professionalism which
characterises the KPG exam system is also
evident in development of the KPG school- a
funded project whose purpose is to to link the KPG language
exams with the state compulsory education
system and to ensure sustainability of the
KPG examination system.
The KPG school is an
e-school aiming at preparing, tutoring and
guiding the following groups of
“stakeholders” for the KPG exams:
-
Possible candidates of the KPG exams
-
Teachers who offer support to
students preparing to sit for the exams
-
Teachers in the role of KPG
assessors (i.e., script raters and oral
examiners)
-
Parents of candidates of the
KPG exams.
The KPG e-school
will mainly include digital material aiming
at successfully developing KPG candidates’
test-taking strategies that are necessary in
all four tests of each exam battery for each
language. More specifically, the KPG
e-school will provide candidates with tasks
focusing on the aforementioned strategies,
accompanied by teaching instructions and
guidance so that they can be used both by
the teacher and by students of different age
and literacy background.
As regards the administration of this
on-line educational material, an e-directory
will be developed and e-learning tools will
be
used
with
hyperlinks, to help the KPG candidates
navigate easily, depending on the type of
task they want to do or on the information
they want to find.
Additionally, the KPG e-school will include teaching advice as
well as tips for dealing with the various
tasks, with guidance
and instructions in
every step. Access to the KPG e-school
website will be open, as its aim is to be
used by the KPG candidate outside the
regular school hours either autonomously or
with the support/guidance of the teacher.
References
Alderson, J.,
Clapham, C., & Wall,
D. (1995). Language Test
Construction and Evaluation.
Cambridge: Cambridge University
Press.
Bachman, L.F. (2000). Modern language testing at the turn of the
century: Assuring that what we count
counts. Language Testing,
17(1), 1-42.
Bachman, L.F., & Palmer, A.S.
(1996). Language Testing in
Practice. Oxford: Oxford
University Press.
Boyd, K., & Davies, A. (2002).
Doctors’ order for language testers:
the origin and purpose of ethical
codes. Language Testing,
19(3), 296-322.
Dendrinos, B. (2004). KPG – a new
suite of national language
examinations for Greece: philosophy
and goals. Lecture at the ICC AGM
Annual Conference 19/4/2004. Athens:
http://www.icc-europe.com/AGM
2004/documentation/Bessie
plenary.rtf .
Douglas, D. (1995). Developments in
Language Testing. In Grabe W. et al
(Eds.), Annual Review of Applied
Linguistics, 15, Survey of
Applied Linguistics (pp. 167-187. New York: Cambridge
University Press.
Fulcher, G., & Bamford, R. (1996). I
didn’t get the grade I need, where’s
my solicitor? System, 24(4),
437-448.
Hamp-Lyons, L. (2000). Social,
professional and individual
responsibility in language testing.
System, 28, 579-591.
Karavas E. (Ed). (2008). The KPG
Speaking Test in English: A
Handbook. National and Kapodistrian
University of Athens, Faculty of English
Studies:
RCeL Publications,
Series 2 (RCeL Publication Series
Editors: Bessie Dendrinos & Kia Karavas).
Karavas E. (2009a). The KPG exams:
Training the examiners. ELT News,
Nï. 236, p.8.
Karavas E.
(2009b). Training script raters for
the KPG exams in English. ELT
News, Íï. 239, p.16.
Kunnan, A.J. (1999). Recent
developments in language testing. Annual Review of Applied
Linguistics, 19, 235-253.
Lynch, B.K. (2001). Rethinking
assessment from a critical
perspective. Language Testing, 18(4), 351-372.
McNamara, T. (2001). Language
assessment as social practice:
Challenges for research. Language
Testing, 18(4), 333-349.
McNamara, T., & Roever, T. (2006). Language Testing:
The Social
Dimension. Oxford: Blackwell.
Messick, S. (1989). Validity. In R.L.,
Linn (Ed.), Educational
Measurement (pp. 13-103), (3rd
ed.). New York: American Council on
Education/MacMillan.
Mitsikopoulou, B.
(Ed.). (2009). The KPG Writing
Test in English: A Handbook.
Athens: RCeL Publication Series 3,
Faculty of English Studies, National
and Kapodistrian University of
Athens (RCeL Publication Series
editors: Bessie Dendrinos & Kia Karavas).
Nevo, D., & Shohamy. E. (1986).
Evaluation standards for the
assessment of alternative testing
methods: An application. Studies
in Educational Evaluation, 12,
149-158.
Saville, N. (2002). Quality and
fairness: The ALTE code of practice
and quality management systems. Sprogforum, 23, Vol 8:
45-50.
Shohamy, E. (1993). The Exercise of
Power and Control in the Rhetorics
of Testing. In A. Huhta, K.
Sajavaara, & S. Takala. (Eds.),
Language Testing: New openings
(pp. 23-38). Jyvaskyla:
University of Jyvaskyla.
Shohamy, E. (1999). Critical
language testing: use and
consequences of tests,
responsibilities of testers and
rights of test takers. Paper
presented at the 21st
Annual Language Testing Research Colloquim, Tsukuba, Japan.
Shohamy, E. (2000). Fairness in Language Testing. In A.J. Kunnan
(Ed.), Fairness and Validation in
Language Assessment. Selected
papers from the 19th
Language Testing Research
Colloquium, Orlando Florida.
Studies in Language Testing, 9,
15-20.
Shohamy, E. (2001a). Democratic
assessment as an alternative. Language Testing
18(4),
373-391.
Shohamy, E. (2001b). The Power of
tests: A Critical Perspective on the
Use of Language Tests. Harlow:
Longman.
Spaan, M. (2000). What, if any, are
the limits of our responsibility for
fairness in language testing. In A.J. Kunnan (Ed.),
Fairness and validation in Language
Assessment. Selected papers
from the 19th Language
Testing Research Colloquium, Orlando Florida. Studies in
Language Testing, 9, 35-39.
Spolsky, B. 1981. Some Ethical
Questions about Language Testing. In
C. Klein-Braley & D.K. Stevenson
(Eds.), Practice and Problems in
Language Testing. Frankfurt:
Peter Lang.
Stevenson, D.K. (1985).
Authenticity, validity and a tea
party. Language Testing, 2(1),
41-47.
Stobart, G. (2005). Fairness in
multicultural assessment systems. Assessment in Education 12(3),
275-287
Endnotes
[1]
For
the procedures followed in test
development, test administration and
test score interpretation, see Fulcher & Bamford 1996.
[2] The interested reader may refer to
Alderson, Clapham and Wall, 1995 or
McNamara and Roever, 2006, or the
respective websites of these
organizations, for further details
and discussion.
[4] For more details on the ALTE Code of
Practice see www.alte.org, Fulcher
and Bamford 1996 and Saville 2002.
[5] See Liontou (this
issue) and Section 5.2 below for an
example of such research.
[6] See Hartzoulakis,
this volume.
[7] For the descriptions
of the respective training
programmes (aims, principal
features, nature, content, and
frequency, see Karavas 2008, Karavas
2009 a,b, and Mitsikopoulou 2009).
For a brief description of the oral
examiner training programme, see
Delieza, this volume.
[8] For more information
on the nature and main results of
the observation project see Delieza
this volume.
Appendix
1
Extract
from the ILTA Code of Ethics
The
following extract presents all the principles included in the Code and
the Annotations for the first two
principles.
Principle 1
Language
testers shall have respect for the
humanity and dignity of each of
their test takers. They shall
provide them with the best possible
professional consideration and shall
respect all persons’ needs, values
and cultures in the provision of
their language testing service.
Annotation
-
Language testers
shall not discriminate against
nor exploit their test takers on
the grounds of age, gender,
race, ethnicity, sexual
orientation, language
background, creed, political
affiliations or religion, nor
knowingly impose their own
values (for example social,
spiritual, political and
ideological) to the extent that
they are aware of them.
-
Language testers
shall never exploit their
clients nor try to influence
them in ways that are not
related to the aims of the
service they are providing or
the investigation they are
mounting.
-
Sexual relations
between language testers and
their test takers are always
unethical.
-
Teaching and
researching language testing
involving the use of test takers
(including students) requires
their consent; IT ALSO REQUIRES
respect for their dignity and
privacy. Those involved should
be informed that their refusal
to participate will not affect
the quality of the language
tester’s service (in teaching,
in research, in development, in
administration). THE USE OF all
forms of media (paper,
electronic, video, audio)
involving test takers requires
informed consent before being
used for secondary purposes.
-
Language testers
shall communicate the
information they produce to all
relevant stakeholders in as
meaningful a way as possible.
-
Where possible,
test takers should be consulted
on all matters concerning their
interests.
Principle 2
Language
testers shall hold all information
obtained in their professional
capacity about their test takers in
confidence and they shall use
professional judgment in sharing
such information.
Annotation
-
In the face of
widespread use of photocopied
materials and facsimile,
computerized test records and
data banks, the increased demand
for accountability from various
sources and the personal nature
of the information obtained from
test takers, language testers
are obliged to respect test
taker’s right to confidentiality
and to safeguard all information
associated with the tester-test
taker relationship.
-
Confidentiality
cannot be absolute, especially
where the records concern
students who may be competing
for admissions and appointments.
A careful balance must be
maintained between preserving
confidentiality as a fundamental
aspect of the language tester’s
professional duty and the wider
responsibility the tester has to
society.
-
Similarly, in
appropriate cases, the language
tester’s professional colleagues
also have the right to access
data of test takers other than
their own in order to improve
the service the profession
offers. In such cases, those
given access to data should
agree to maintain
confidentiality.
-
Test taker data
collected from sources other
than the test taker directly
(for example from teachers of
students under test) are subject
to the same principles of
confidentiality.
-
There may be
statutory requirements on
disclosure, for example where
the language tester is called as
an expert witness in a law court
or tribunal. In such
circumstances the language
tester is released from his/her
professional duty to
confidentiality.
Principle 3
Language
testers should adhere to all
relevant principles embodied in
national and international
guidelines when undertaking any
trial, experiment, treatment or
other research activity.
Principle 4
Language
testers shall not allow the misuse
of their professional knowledge or
skills, in so far as they are able.
Principle 5
Language
testers shall continue to develop
their professional knowledge,
sharing this knowledge with
colleagues and other language
professionals.
Principle 6
Language
testers shall share the
responsibility of upholding the
integrity of the language teaching
profession.
Principle 7
Language
testers in their societal roles
shall strive to improve the quality
of language testing assessment ad
teaching services, promote the just
allocation of those services and
contribute to the education of
society regarding language learning
and language proficiency.
Principle 8
Language
testers shall be mindful of their
obligations to the society within
which they work, while recognizing
that those obligations may on
occasion conflict with their
responsibilities to their test
takers and to other stakeholders.
Principle 9
Language
testers shall regularly consider the
potential effects, both short and
long term on all stakeholders of
their projects, reserving the right
to withhold their professional
services on the grounds of
conscience.
Appendix
2
The ALTE Code of
Practice: Part 1 – Responsibilities
of ALTE Members
The following is an
extract from the ALTE Code of
Practice. The responsibilities of
examination developers are
presented.
Developing
examinations
Members of ALTE
undertake to provide the information
that examination users and takers
need in order to select appropriate
examinations.
In practice, this
means that members of ALTE will
guarantee to do the following, for
the examinations described in this
book
-
Define what each
examination assesses and what it
should be used for. Describe the
population(s) for which it is
appropriate.
-
Explain relevant
measurement concepts as
necessary for clarity at the
level of detail that is
appropriate for the intended
audience(s).
-
Describe the
process of examination
development.
-
Explain how the
content and skills to be tested
are selected.
-
Provide either
representative samples or
complete copies of examination
tasks, instructions, examination
sheets, manuals and reports of
results to users.
-
Describe the
procedures used to ensure the
appropriateness of each
examination for the groups of
different racial, ethnic, or
linguistic backgrounds who are
likely to be tested.
-
Identify and
publish the conditions and
skills needed to administer each
examination.
I
nterpreting
examination results
Members of ALTE
undertake to help examination users
and takers interpret results
correctly.
In practice this
means that members of ALTE will
guarantee to do the following:
-
Provide prompt
and easily understood reports of
examination results that
describe candidate performance
clearly and accurately.
-
Describe the
procedures used to establish
pass marks and/or grades.
-
If no pass mark
is set, then provide information
that will help users follow
reasonable procedures for
setting pass marks when it is
appropriate to do so.
-
Warn users to
avoid specific, reasonably
anticipated misuses of
examination results.
-
Strive for
fairness.
-
Undertake to make
their examinations as fair as
possible for candidates of
different backgrounds (e.g.
race, gender, ethnic origin,
handicapping conditions, etc.)
In practice this
means that members of ALTE will
guarantee to do the following
-
Review and revise
examination tasks and related
materials to avoid potential
insensitive content or language.
-
Enact procedures
that help to ensure that
differences in performance are
related primarily to the skills
under assessment rather than to
irrelevant factors such as race,
gender and ethnic origin.
-
When feasible,
make appropriately modified
forms of examinations or
administration procedures
available for candidates with
handicapping conditions.
Informing examination
takers
Members of ALTE
undertake to provide examination
users and takers with the
information described below.
In practice, this
means that members of ALTE will
guarantee to do the following:
-
Provide
examination users and takers
with information to help them
judge whether a particular
examination should be taken, or
if an available examination at a
higher or lower level should be
used.
-
Provide
candidates with the information
they need in order to be
familiar with the coverage of
the examination, the types of
task formats, the rubrics and
other instructions and
appropriate examination taking
strategies. Strive to make such
information equally available to
all candidates.
-
Provide
information about the rights
which candidates may or may not
have to obtain copies of papers
and completed answer sheets, to
re-take papers, have papers
re-marked or results checked.
-
Provide
information about how long
results will be kept on file and
indicate to whom and under what
circumstances results will or
will not be released.
Appendix 3
Minimum standards for
establishing quality profiles in
ALTE examinations
The complete list of
the ALTE Miminum Standards for
establishing quality profiles is
presented below.
Test construction
-
The examination
is based on a theoretical
construct, e.g. on a model of
communicative competence.
-
You can describe
the purpose and context of use
of the examination, and the
population for which the
examination is appropriate.
-
You provide
criteria for selection and
training of test constructors
and expert judgment involved
both in test construction and in
the review and revision of the
examinations.
-
Parallel
examinations are comparable
across different administrations
in terms of content, stability,
consistency and grade
boundaries.
-
If you make a
claim that the examination is
linked to an external reference
system (e.g. Common European
Framework), then you can provide
evidence of alignment to this
system.
Administration and
logistics
-
All centres are
selected to administer your
examination according to clear,
transparent, established
procedures, and have access to
regulations about how to do so.
-
Examination
papers are delivered in
excellent condition and by
secure means of transport to the
authorized examination centres,
your examination administration
system provides for secure and
traceable handling of all
examination documents, and
confidentiality of all system
procedures can be guaranteed.
-
The examination
administration system has
appropriate support systems
(e.g. phone hotline, web
services)
-
You adequately
protect the security and
confidentiality of results and
certificates, and data relating
to them, in line with current
data protection legislation, and
candidates are informed of their
rights of access to this data.
-
The examination
system provides support for
candidates with special needs.
Marking and grading
-
Marking is
sufficiently accurate and
reliable for purpose and type of
examination.
-
You can document
and explain how marking is
carried out and reliability
estimated, and how data
regarding achievement of raters
of writing and speaking
performances is collected and
analysed.
T
est analysis
-
You collect and
analyse data on an adequate and
representative sample of
candidates and can be confident
that their achievement is a
result of the skills measured in
the examination and not
influenced by factors like L1,
country of origin, gender, age
and ethnic origin.
-
Item level data (e.g. for
computing the difficulty,
discrimination, reliability and
standard errors of measurement
of the examination) is collected
from an adequate sample of
candidates and analysed.
Communication with
stakeholders
-
The examination administration
system communicates the results
of the examinations to
candidates and to examination
centres (e.g. schools) promptly
and clearly.
-
You provide
information to stakeholders on
the appropriate context, purpose
and use of the examination, on
its content and on the overall
reliability of the results of
the examination.
-
You provide
suitable information to
stakeholders to help then
interpret results and use them
appropriately.
[Back]