Compstat 2002

e-stat: Automatic Evaluation of Online Exercises

Knut Bartels

1. Subject of the paper, main points of argument
Exercises and tests are an important part of any statistics course.
A central issue in exercises is the feedback on given solutions.
In online teaching or for lectures with large audiences giving this feedback is only possible with a high input of time or human resources.

A solution to this problem would be online exercises that are evaluated
automatically without human interaction. This is easy to achieve with multiple choice questions. But we concentrate on the more realistic and pedagogically preferable option of free text answers.

In principle the given answer to be evaluated is compared with known answers, previously given or expected, correct answers as well as faulty ones. Statistically spoken this is a problem of classification of free text. The classification is based on analysis of the free text with (partially statistical) methods of computational linguistics and knowledge representation. The presentation is on a work in progress as part of the e-stat project. The natural language of the system is German.

2. Relations with the literature
In the literature on statistics education the automated evaluation of free text answers was not a topic yet. In the field of artificial intelligence and machine learning a lot of research is done on automatic classification of (longer) texts. But this does not solve the problem of finely classifying a single sentence that should contain a statement on a known topic. In natural language processing formal methods are developped
but less emphasis is layed on the semantic content of the analyzed text.

3. Main formulas and original results
The first step of processing the input is a spell check. Then the text is syntactically parsed with statistical handling of ambiguities on the basis of a corpus of statistical german texts with a special weight on answers to statistics exercises.
Now the tagged input is compared to a set of expected answers, which must be given by the author of the exercise or has been learnt by the system.
If there is an almost perfect agreement, then the answer of the system is that corresponding to the expected answer. In the more likely case that there is no perfect agreement, a scoring method is applied to classify the answer. This scoring is based on keywords and tags, and if possible, on a semantic annotation of the text. The latter is a very difficult problem for which first concepts are tested. As a necessary tool a formal semantic annotation of statistical expressions is constructed. Again depending on the score the answer of the system is that one corresponding to the expected answer with highest score.

4. Relevance for statistics, informatics and applications
The relevance for teaching statistics was already mentioned in the first
paragraph. Further, automatic answering systems can be applied in a huge variety of software applications.

5. Main references
- Abney, S. (1996): Statistical Methods and Linguistics.
In Klavans, J. & Resnik, P. (eds) The Balancing Act. Cambridge, MA: MIT Press.
- Brants, T. (1999): TnT - A Statistical Part-of-Speech Tagger.
In Proceedings of the Sixth Applied Natural Language Processing Conference ANLP-2000, Seattle, WA.
- Charniak, E. (1997): Statistical Techniques for Natural Language Parsing. AI Magazine 18(4): 33-44.
- Manning, C.D. & Schütze, H. (1999): Foundations of statistical natural language processing. The MIT press.
- Yarowsky, D. (1995): Unsupervised Word Sense Disambiguation Rivaling Supervised Methods.
In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Cambridge, MA, pp. 189-196.
- NEGRA Korpus, Sonderforschungsbereich 378, Universität des Saarlandes.
http://www.coli.uni-sb.de/sfb378/projects/NEGRA-en.html

6. Keywords
online exercises, automatic evaluation, classification of free text, computational linguistics, hidden Markov models, knowledge representation, semantic statistical annotation