Automated JudgingThe automated judge is the corner stone of Mooshak. Its role is to automatically classify a submission according to a set of rules and produce a report with the evaluation for further validation by a judge person. A submission is composed by data relevant for the evaluation process, that is the program source code, the team-id, the problem-id, and the programming language (this is automatically inferred from the source code file extension). Submissions are automatically judged and the corresponding result is almost instantaneously displayed to the teams, although initially in a pending state. Judge persons have the responsibility of validating pending classifications, making them final, and occasionally modify initial classifications. A classification may have to be modified as a result of changes in the compilation and execution conditions (e.g. changes in test cases) that required reevaluating submissions. Reevaluation produces another report that has to be compared with previous ones. The automated judging can be divided in two parts according to the type of analysis:
Static analysis starts by verifying if the submitted problem has already been solved, in which case the submission is rejected and no classification is given. Then it goes on to confirm the verifications made by the interface, i.e. by double checking the submitted data for team ownership, problem-id and size of program source. If it succeeds in this verification, it compiles the submitted program using the compilation command line defined in the administration interface. Mooshak can be more or less tolerant according to the flags chosen for each compiler. An error or compiler warning detected in this stage aborts the automated judging and dynamic analysis is skipped. The following table lists the verifications performed during static analysis and the associated classifications upon failure.
Dynamic analysis involves the execution of the submitted program with each test case assigned to the problem. A test is defined by an input and an output file. The input file is passed by the standard input to the program being executed and its standard output is compared with the output file. The errors detected during dynamic analysis determine the classifications listed in the following table.
Each classification has an associated severity rank and the final classification is that with the highest severity rank found in all test cases. The highest severity is given to the rare situation where the system has an indication that the test failed due to lack of operating system resources (inability to launch more processes, for instance). The lowest severity is the case where no other error was found, using the test cases, and therefore the submission is accepted as a solution to the problem. The automatic judge marks an execution as "Accepted" only if the the standard output is exactly equal to the test output file. Otherwise the output file and standard output are normalized and compared again. In the normalization both outputs being compared are stripped of all formatting characters. If after this process the outputs become equal then the submission is marked as having a "presentation error"; otherwise it is marked as a "wrong answer". In the current implementation the normalization trims white characters (spaces, newlines and tabulation characters) and replaces sequences of white characters by a single space. This is a general normalization rule since white characters are only used for formatting. In a specific problem other classes of characters could have the same meaning. For instance, in a problem where the only meaningful characters are digits, other characters, such as letters or punctuation, could be treated as formatting characters. This cannot be done in general since many problems have a meaningful output that includes letters. This feature will require having a meaningful class of characters defined for each problem output. |