Showing posts with label Examining. Show all posts
Showing posts with label Examining. Show all posts

Sunday, 13 April 2014

Looking the other way

Image: freeimages
I've done quite a lot of searching through the research library to summarise some of the key findings of my colleagues, but what does everyone else think? There are other exam boards, and undoubtedly many independent researchers in the field of GCSE and A-level examining, so I've done a search for some alternative viewpoints on the matter.


Marking and cognitive psychology

How do examiners actually go about the process of marking, and does it actually matter? Greatorex & Suto (2006) looked at cognitive approaches taken by examiners when undertaking marking, and identified five distinct approaches that were used - rarely in isolation. The approaches identified were matching, scanning, evaluating, scrutiny and no response. These were related to the 'System 1' (quick, associative) and 'System 2' (slow, rule-governed) thought processing models described by psychologists (Kahneman & Frederick, 2002).

The study involved having examiners mark papers, and 'think out loud' about the approach they were taking. When marking short-form answers that could be easily distinguished by single words or numbers, examiners used the 'matching' approach (System 1) to quickly assign marks by pattern recognition. Some longer answers could be marked through the use of 'scanning' to pick either key words (System 1, pattern recognition); or distinct phrases or stages of calculations (System 2, semantic processing). For more detailed answers, examiners moved to the 'evaluating' approach to assess the candidate's response, usually drawing on a variety of sources, and compare these to their own knowledge and the mark scheme (entirely System 2). Where responses deviated noticeably from the mark scheme, examiners would engage in 'scrutinising' to identify if the response was worthy of credit, for instance an unexpected but valid response; this approach also draws entirely on System 2. In the case of 'no response', examiners would use a simple System 1 approach to check that material has not been written elsewhere.

The researchers then went on to analyse how frequently the different approaches were used in different subjects. There was a marked difference between Mathematics and Business Studies papers: Mathematics responses called for a high level of matching, with slightly less evaluation, and relatively small amounts of scanning and scrutinising; Business Studies drew heavily on evaluating, with matching as the secondary approach, and small amounts of the other approaches. Most importantly, the study showed that the different approaches were used across multiple subjects.

There was notably no relation between marking strategy and marking reliability - multiple approaches could be equally valid and successful. There was also no significant difference between marking approaches for novice and experienced markers. Senior examiners went on to suggest that new examiners could benefit from some explicit advice being given to new examiners about their approach to marking, possibly with screen recordings overlaid with commentary. The researchers also noted that there did not seem to be any difference in cognitive approach between paper-based marking and on-screen marking, although this had yet to be confirmed by direct study.

References:
  • Greatorex, J. & Suto, W.M.I., 2006. An empirical exploration of human judgement in the marking of school examinations. In International Association for Educational Assessment Conference. Singapore. Available at:http://iaea.info/documents/paper_1162a2471.pdf
  • Kahneman, D. & Frederick, S., 2002. Representativeness revisited: Attribute substitution in intuitive judgment. In T. Gilovich, D. Griffin, & D. Kahneman, eds. Heuristics and biases:  The psychology of intuitive judgment. New York,  NY, US: Cambridge University Press, pp. 49–81.

Tuesday, 8 April 2014

Reliability of public examinations

Image: freeimages
What makes for a reliable exam series? And who actually knows or cares?


A community effort?

Baird, Greatorex and Bell (2004) set out to study the effect that a community of practice could have on marking reliability - their study concluded that discussion between examiners at a standardisation meeting had no significant effect on marking reliability, therefore the effect of a community of practice was called into question. I have to challenge this conclusion, because the artificial and synchronous conditions of a standardisation meeting most definitely do not represent the potential for the asynchronous nature of a community of practice. The authors didn't take this into account - interesting that they actually cited Wenger in their references. Later on in the paper they actually seem to acknowledge that tacit knowledge already acquired in a community of assessment practice might actually explain why the format of standardisation meetings seemed to have no significant effect - this seems more in line with the effect that I would expect a community of practice to have.



What do these people know anyway?

Later research by Taylor (2007) sought to investigate public perceptions of the examining process. The study involved a range of participants - students, parents and teachers, using a range of interview techniques, and looking at issues such as how papers are marked, the reliability and quality of marking, and the procedure for re-marks. The level of awareness about these issues varied somewhat, but it seemed very few people (even teachers) had full knowledge of the entire process.

There seemed to be a perception among students and parents that several examiners would mark each script (p.6) and arrive at a final mark through consensus, while teachers generally felt that a single examiner marked each paper, but this was based on perception rather than firm knowledge. Students and parents did not seem to have any real knowledge of how examiners might arrive at a mark, and even teachers were not aware of the hierarchical method, although they agreed that it seemed sensible when it was explained to them. All groups believed that the system would work better with multiple examiners, although they did acknowledge that this might be unrealistic given time and financial constraints. When questioned about the possible merits of having multiple examiners mark a single paper, examiners commented that any gains would be minimal in comparison to the present hierarchical system, and the cost would be prohibitive.

Members of the public seemed to have a better awareness of the concept of reliability (p.7), with teachers having a similar perception to examiners about the potential for marks between examiners to differ within a band of ability. Students and parents seemed to understand the potential for marks to differ, although their expectations were more optimistic than the real situation. There was much less understanding of how quality of marking was assured (p.8), with perceptions varying from an assumption that there was no checking at all. to some people believing that far more scripts were checked than is the case. All parties agreed that quality checking was desirable, and examiners appeared to support the current system. Re-marking was more common knowledge to all the participants (p.10), although there was little knowledge of the precise system used.

The report considered whether attempts to increase public understanding of the exam system would promote public trust - although some believe that greater transparency might actually invite criticism, other literature seems to suggest that revealing the workings of the system would lead to more realistic expectations and improved engagement, rather than a focus on failings. Establishing a clearer link between understanding of the exams process and public confidence was suggested for future research.

More on reliability

Chamberlain (2010) set out to build on the research into public awareness of assessment reliability - which also ties in well with some of the points raised by Billington (2007) about whether the general public are acting on good information or misinformation.
The research into public awareness used focus groups as a means of drawing out understanding, as it bypasses some of the possible problems of researcher bias (since the researchers were AQA employees, and can help foster a collaborative environment to trigger the sharing of opinions and ideas (p.6). One remaining source of bias was that the groups were small and were composed primarily of people with a particular interest in the exams.

Research with several groups of people, including a number of teachers and trainee teachers, suggested little general understanding of the concept. Secondary teachers had more understanding, often due to exposure to requesting re-marks for their pupils. Promisingly, most of the participants cared enough to indicate they would like to be better informed about the overall process of examinations, although there was no support for any quantification of reliability - most people already seemed to accept that there would inevitably be some variance in how marks were awarded, but placed trust in examiners to act as professionals. The exception to the rule is when a very public failure occurs, but even in this case attention rapidly fades once the problem seems to be 'fixed', with little real gain in understanding. The video below is one attempt to get some understanding out into the public domain:


Making The Grades - A Guide To Awarding



References:


Friday, 27 December 2013

The 'Carry On' factor

Image: freeimages

Following on from my thoughts about quantitative research, I'm looking at some of the dependent variables that will come into play, and thinking about how I might go about analysing them.

Intention to continue examining & job satisfaction


This is an extremely important factor for exam boards, as they are dependent on a large network of examiners to make our examinations possible. Meadows (2004) identified four factors that affect examiners' attitudes towards their jobs:
  1. The pressure and stress of examining
  2. Insight gained from examining
  3. Support from awarding body and senior examiners
  4. Pay
However, Meadows found that only the pressure and stress of examining, and the level of support received, predicted intention to continue examining; however pay did affect examiners' job satisfaction. One of the key sources of stress came from balancing examining duties with regular work, with the report recommending that resources should be diverted to lobbying for examiners to be given more time away from teaching to examine, in order to improve retention. Improving the level of support was also a recommendation, although the report notes that this would be less cost effective, since most examiners were already relatively satisfied with the support they received. Increasing pay would improve job satisfaction, but the report states that this would not improve retention.

The introduction of software tools


Tremain (2011) followed up this work to consider how the situation had changed after the introduction of electronic marking and online standardisation. The study looked at the factors that influence the satisfaction that examiners express about their work, and highlighted three factors underpinning examiners' intention to continue:

  1. The relationship between examining work and work outside examining
  2. The pressures of examining and support received
  3. The incentives for examining
The study states that although there is no imminent threat to examiner retention, future threats include the increasing use of online tools, which can contribute to examiners feeling unsupported or undervalued. Job satisfaction is considered to be more important in retention than reward for the majority of jobs, with social interaction and appropriate challenge being considered particularly valuable. The adoption of online tools had contributed to a sense of isolation amongst examiners, and also made the work more routine - although the reliability of marking has actually increased as a result.

A further study (Tremain, 2012) also set out to evaluate how specific factors involved in online marking & standardisation contributed to examiner satisfaction. This concluded that there was no significant difference in intention to continue marking between examiners who were standardised using face-to-face or online methods. Examiners who had marked using a mixture of paper and online methods showed a very slight increase in intention to continue examining. However, it was noted that the results were confounded by the different subjects and levels of experience amongst the participants.


Variables that we may be able to influence, and how:

  • Support received. By considering the different levels of support that are currently offered from the contextual model for learning (Shepherd, 2011) and identifying possible gaps, we may be able to improve the support offering for examiners in a rational way. I have already laid out some initial thoughts for this approach.
  • Insight gained from examining. Making key insights from senior examiners available in a digital form which can be shared more easily online, for instance through learning management systems or webinars, could help to ensure efficient dissemination of relevant information.
  • Social interaction. This is a long term goal that our organisation may want to consider for retaining examiners. Although we are increasingly unable to provide opportunities for examiners to meet in a face-to-face setting, there are possibilities for facilitating some more informal interaction around scheduled events. One of my colleagues is keen to run webinars for examiners to gain insight from senior examiners, and careful use of online chat could help to provide a better sense of community.
Any or all of these methods could be attempted, with measurement of the effect on intention to continue, and also examiner performance, being undertaken to determine effectiveness. One concern I have is that apparent failure to make a difference at first might result in a loss of enthusiasm for innovation, hence there would need to be trust established with stakeholders for future improvements. Undertaking action research alongside quantitative measurements to demonstrate a rational approach would be key to successful establishment of such trust.

References: