Project workflow, types of users and permissions
The workflow of an evaluation project consists of the following steps:
- Project creation
- Project setting (evaluation format setup, guidelines setup, import data, user invitation)
- Evaluation Launch
- Evaluation
- Evaluation progress management
- Review
The tool allows different persons with different roles to be involved in an evaluation project. To this purpose, five different types of users can be activated. Each user has different permissions:
- project owner: is the creator of the project. S/he can define the main settings of the project (evaluation format setup, data import, guidelines setup) and typically invites project managers (PjM) to continue with the project workflow. The owner can monitor her/his projects and export project data. S/he can also delete his/her project. Typically, the owner is not involved in the evaluation process and thus does not receive email notifications regarding evaluations and their review. The owner can assign to her/himself (or be assigned by an admin) all the roles in the project, including the "admin" role in case s/he has to directly manage the evaluation workflow and receive related notifications;
- admin: corresponds to the project manager (PjM), who typically defines the evaluation guidelines, invites other requested roles for the project, and launches evaluation. Like the owner, the admin can perform all tasks in the workflow, including deleting evaluations and deleting the whole project. Also, s/he can monitor her/his projects and export project data. With respect to the owner, the admin additionally receives email notifications regarding evaluations and their review. S/he can also delete evaluations. The admin can assign to her/himself (or be assigned by another admin or the owner) all the roles in the project, apart from the “owner” role;
- vendor: invites the evaluators. S/he can monitor the project workflow but can see only evaluators invited by her/him. The vendor can manage the evaluations of her/his evaluators and is notified by email about all actions regarding these evaluators. The vendor can perform also the "evaluator" role;
- evaluator: carries out evaluation;
- reviewer: reviews evaluations and can modify evaluators’ judgments.
In order to ensure that the workflow is respected and - consequently - that the tasks/evaluation data are consistent, some specific actions are required by the tool. This is necessary for two main reasons: (i) there are dependencies between the tasks, since some features set in earlier tasks have an impact in the following tasks and cannot be changed without affecting the soundness of the following tasks (and consequently of the evaluation data); (ii) different users can perform the tasks. Below the main dependencies - and related actions - are listed.
Setting Tasks: "evaluation format setup", "specifications setup", "import data".
These tasks can be performed in parallel (by different users) but it is necessary to manually mark each task as
Complete, so to confirm that the task is finalized by the user who is in charge
of
it.
Since it can happen to find problems also when these tasks are completed, the owner or the admin can manually
change the status of the tasks back to "incomplete". Note that this action does
not
delete any existing data or setting: it only allows the admin to edit the task page again, so to modify it
if/where necessary.
If a task is to be marked as incomplete when evaluation has already started, it is necessary to stop evaluation
before.
When the status of a completed task is changed back to "incomplete", an email
notification is automatically sent to all participants in the project, as well as when it is marked again as
"complete". Detailed information about these emails is given in Appendix 2.
"Evaluation launch" Task. Evaluation must be manually launched and can be manually stopped if necessary. Evaluation launch/stop applies to all evaluators involved in the project.
"Evaluation review" Task. The reviewer must manually mark as complete her/his task, so that the admin/owner knows that the review is finished and can finalize the project.
Depending on the role in the project (and the related permissions), different functions of the interface are accessible. In the following sections, where the MT-equal interface is described, all details about which parts of the tool are accessible to each user/role are given.
Tool Requirements and Availability
To work with the MT-equal tool, you must use one of the supported browsers: Chrome, Firefox, Safari, and Opera. MT-equal is Web application developed in PHP and released as open source and licensed with the Apache 2.0 license . The software is available on Bitbucket at https://bitbucket.org/cgiuliano/mtequal_2 , the instruction to install and run the application are contained in the README file.
How to create an account
To access the tool, username and password must be entered. If the user is new, s/he must sign up, otherwise s/he directly logs in.


The password can be reset by providing the username or email address used to register.

To access the reset password page, you must click or paste in the browser the link contained in the received email:

To reset the password, you must enter the new password and confirm it.

Access Interface
Once logged in, the user is presented with the main menu, which is composed of five items: "Dashboard", "Create Project", "Projects", "User", "Help".

Dashboard
In the Dashboard, all the projects in which the user is involved are listed. The page contains two tabs:
- Overview, where all the projects in which the user is involved are listed, together with relevant information about each project.
- Requests, where details about invitations to projects are presented in chronological order.


Projects
In this menu, all the projects are listed so that it is possible to directly switch from one project to the other without passing through the Dashboard.

User Profile
In the “User Profile” page it is possible to change the user’s profile info (personal data and password) and to log out. Note that personal data also include the Access Token necessary to allow external applications to call the MT-equal REST API on behalf of the user.

In the “Change password” page it is possible to change the password.

Project Interface
By clicking on a given project name in the "Dashboard" or in the "Project" menu, the user is presented with the project interface. The interface can contain different panels. Not all panels are always visible to the user, since their accessibility depends on the role performed by the user in the project. All the possible panels are: "Overview", "Monitor", "Settings", "Evaluation launch", "Evaluation", "Evaluation Progress", "Review", "Report".

The various users involved in the project can access different parts of the interface according to their role in the project:
- Owner: overview, monitor, settings, evaluation launch, evaluation progress, report
- Admin: overview, monitor, settings, evaluation launch, evaluation progress, report
- Vendor: overview, settings, evaluation progress, report
- Reviewer: overview, review, report
- Evaluator: evaluation
If a user is assigned multiple roles in the same project, s/he will see the corresponding parts of the interface (e.g. if the "admin" has also the "evaluator" role, s/he will also see the evaluation panel).
Overview
In the "Overview" page of the project, all its main features are summarized. The page is accessible at any stage of the project, and displays all the features that have been set up to the moment in which the page is accessed.

Monitor
The "Monitor" page reports about the project’s status. The list of the tasks that must be carried out to complete the project is presented. For each task its corresponding status is given, as well as information about the due date (if set by the task manager). The task name links directly to the interface page where the task can be accomplished.

Settings
The "Settings" page is used to set the specific characteristics of the evaluation project and to engage its participants. The available functions are grouped into 5 main classes: (i) General, including "Project details", "Delete", "Clone project"; (ii) Evaluation, including "Format”; (iii) Guidelines, including "Specifications" and "Instructions"; (iv) Data, including "Import" and "Export"; (v) Management, including "User invitation" and "Project deadlines".

The various users involved in the project can access different setting functions according to their
role.
− Owner and Admin: all
− Vendor: only User invitation
− Reviewer: no access
− Evaluator: no access
In the following subsections, the setting functions are described according to the Project workflow, so to allow a better understanding of the features of the tool.
Project details
In this page it is possible to modify the name of the project and its description. See figure above.
Delete project
In this page, it is possible to permanently delete data. In particular, the owner can delete the whole project, while the admin can delete evaluation data.


Clone project
If the evaluation format and guidelines are to be reused in another project, it is possible to clone the currently defined project and to rename it to obtain a new project without having to create it from scratch.

Evaluation Format
In this page the evaluation format can be set. Four formats are available: "binary", "1-to-x scale", "ranking", and "decision tree". Furthermore, it is possible to require that a reference text or image is included in the data to be evaluated, by ticking the button next to "Include reference".
Depending on the selected evaluation format, different options are presented to the user.
Binary. Two empty fields are displayed, corresponding to the two buttons that will be shown to the evaluator in the evaluation interface. The evaluation labels can be customized.

Scale. A default number of 10 empty fields are displayed: the user can fill the fields with the preferred labels. The filled fields correspond to the points in the rating scale and - consequently - to the buttons that will be shown to the evaluator in the evaluation interface. If more than 10 rating points are required, it is possible to add other empty fields by clicking on "Add more lines".

Ranking. If this evaluation format is chosen, the tool will automatically display ranking buttons in the evaluation interface according to the number of MT outputs that have been uploaded and are to be evaluated.
Decision tree. When this evaluation format is chosen, the user is required to answer a set of five questions, and corresponding YES,NO,N/A answer buttons will be displayed in the evaluation interface. In the current implementation, the questions are those in the Figure below.

Once the evaluation format has been set, the task can be marked as complete by clicking on the corresponding button on the top-left corner of the interface.
Specifications
In addition to the evaluation format, in the “Specifications” page it is possible to set other specific features in order to customize evaluation. A list of project specifications is displayed and can be activated and set by the user. In particular, it is possible:
- to choose whether the MT targets are presented to the evaluators always in the same order or in random order;
- to choose whether the username of evaluators is shown to the reviewer or it is anonymized;
- to automatically send an email notification to the users responsible for evaluation when evaluators answer “I don’t know” more than the chosen threshold. Note that the email is not sent to the evaluator;
- to introduce artificial redundancy in the evaluation data so to allow the assessment of intra-annotator agreement. The user who sets the specifications can choose a percentage of redundancy to introduce in the data, and the corresponding TUs are randomly chosen and duplicated by the tool;
- to pool equivalent Translation Units so to present it to evaluators only once. Three different pooling algorithms can be applied to the corpus, corresponding to different notions of equivalence: (i) exact match - case sensitive , which pools TUs if they have the same words in the same order and spelling and with the same case, (ii) exact matching - case insensitive, which pools TUs if they have the same words in the same order and spelling, but independent of their case, and (iii) bag of words - case insensitive, which pools TUs if they have the same words, but both order and case of the words can be different.

Once the guidelines specifications have been set, the task can be marked as complete by clicking on the corresponding button on the top-left corner of the interface.
Instructions
In this page, specific instructions for evaluators can be added, which will be displayed in the evaluation interface. Instructions can be directly written in the text window or an external file can be uploaded. Note that these instructions can be edited by the admin or the owner at any time during the project.

Import data
The application can import files in Tab Separated Format (TSV). All details about the format required for standard input files are given in Appendix 1.
By clicking on the “Import file” button on the top-left corner of the page, the user can import the data files for the project. The interface shows the list of files that the user uploaded, together with the number of translation units actually uploaded and a report with possible warning and error messages. If the file contains errors it is not uploaded (Size=0). Also, the uploading time is displayed. The uploaded files can be deleted by selecting them and clicking on the "Remove" button.

Export data
In this page different types of data can be exported, namely (i) the original data as they were imported by the user, (ii) the evaluation format, (iii) the project guidelines, and (iv) the evaluation data.

Once the whole dataset has been uploaded, the task can be marked as complete by clicking on the corresponding button.
User Invitation
This page is used to invite and manage the users involved in the project. For each role (admin, vendor, evaluator, reviewer) the interface displays a specific window. Each window is composed of two parts: Invite a [role], where a user can be invited for that role in the project, and Invited [role], where the users already invited for that role can be managed.

Invite a [role]. Only users who have an account in MT-equal can be invited to a project. Each
logged user has his/her own private list of contacts, which includes all and
only
other users that were invited by him/her for that role in previous projects.
To invite a user for the first time it is necessary to enter his/her exact username or email address, and
then click on the "Invite" button.
When a user is already in the list of contacts, it can be found in the "Favorite" list and can be invited by
clicking on his/her username.
Note that each user can perform multiple roles. Roles can be assigned by another user or can be
self-assigned. As an example, if the owner decides to manage the evaluation by her/himself
(without relying on an external admin), and thus needs to receive the email notifications regarding
evaluation, s/he can invite himself/herself in the admin window, exploiting the same
invitation mechanism.
The invitation is sent to the user by email and is also displayed in the user’s “Requests” panel of the
“Dashboard” page. The user can accept or decline the invitation either from the the
email or through the interface.

Invited [role].
As soon as the users are invited to the project, they appear in the "Invited" part of the window. For each
invitee, the following information is displayed: username and email of the
invitee,
username of the user who invited him/her in the project, and "status" of the invitation (pending, accepted
or declined).
It is also possible to remove a user from the project by selecting him/her (ticking on the corresponding
button) and then clicking on the "Revoke" button.
Note that vendors have a different view of the User Invitation page. While the owner and the admin can see
and revoke all the invited users for all the roles, the vendor can see only the
evaluators invited by her/him, and consequently revoke only their role.
The complete list of emails sent to the users involved in the project when the owner/admin/vendor
invites/revokes a user for a role is given in Appendix 2.

Evaluation launch
This page is used by the owner/admin to check the dataset and launch the evaluation.
At the top of the page, a summary of the composition of the dataset to be checked is displayed, showing statistics about translation units to evaluate, which are made up of translation units remaining after pooling (if specified in the specifications) and redundant translation units, which are introduced to calculate intra-annotator agreement (if specified). Also, statistics about the pooled translation units - i.e. those originally uploaded but removed from the evaluation dataset - are given.
After the summary, for each uploaded translation unit all available information is shown, namely its ID, the source sentence, the possible reference translation or associated image, and the MT outputs to be evaluated. Possible comments contained the input data are also visualized.
When the admin has checked the corpus, s/he can click on the "Launch evaluation" button. When the evaluation is launched, all evaluators invited in the project can access the evaluation page and start evaluation. If necessary, the evaluation can be stopped by clicking on the corresponding button.


Evaluation
When evaluators have accepted their invitation to participate in the project, and as soon as evaluation has been launched, the “Evaluation” page is accessible to them.
At the top of the page, the evaluation instructions can be displayed by clicking on the corresponding link.
Right below the instructions, the translation units to be evaluated are presented. For each translation unit, the source sentence, the textual reference (if any), and possible comments are displayed. If available, it is also possible to visualize the reference image, by clicking on its corresponding link. Then, the targets to be evaluated are presented, together with the evaluation format configured for the project. A text field for saving comments is also available. The evaluator clicks on the chosen buttons, and once all the targets are judged, s/he can "confirm" the evaluation by clicking on the green button at the bottom-left corner of the screen. The evaluator can also choose not to evaluate the translation unit, by clicking on the orange "Don’t know" button.

Evaluation progress
This page is used to track and manage the progress of the evaluation. Information about the work of each evaluator is shown. In this page, each evaluator can also be stopped - if needed - without stopping the others (in the whole evaluation activity is to be stopped, the admin can do it in one single passage through the "evaluation launch" page).

Review
Once evaluators have finished their task, the reviewer is notified by email and can access the “Review” page.
In the "Instruction" section of the page, the reviewer can see the evaluation instructions by clicking on the corresponding link.
In the "Evaluations" section of the page, the reviewer can see the evaluated data. S/he can choose to display all the evaluated targets or only those for which evaluators are in disagreement. This is done by clicking on the “Show just disagreement” button on the top-left corner of the section.
A summary of the information relevant to the review is displayed, accordingly to how the "show just disagreement" option was set.
Then, the evaluations to review are listed: each evaluated target is presented together with the corresponding source, information about evaluators’ judgments and inter-annotator agreement figures. Also, the status of the review is presented, distinguishing between "unseen", "seen", and "reviewed".

In order to modify evaluators’ judgments, the reviewer can click on the desired list item: a window is open and all details about the evaluation item are presented, including the original judgments of evaluators. Note that if evaluators where kept anonymous in the guidelines setting, only an anonymous ID represents the evaluator, otherwise the evaluator’s username is displayed. The reviewer can click on the desired judgment and the corresponding button is circled in blue. The reviewer can also add a comment by writing it in the comments text box and then clicking on the callout icon.
Once the revision is made, the reviewer’s judgments become the official ones.

Once the review is finished, the task can be marked as complete by clicking on the corresponding button on the top-left corner of the interface.
Report
The evaluation report is available since the beginning of the project and is incrementally filled and updated as soon as the evaluation setting, evaluations and revisions become available.
The report is composed of three sections. In the "Evaluation summary", the evaluation set up of the
project is presented. In the "Evaluation results" and
"Review results" sections, the user can see the outcome of the evaluation phase and the review phase,
respectively. The "results" sections have the same structure:
− "Answers". For each system,
the number of judgments for each evaluation category is presented, together with
the corresponding average score and the closest answer.
− "Evaluation Quality". As an indicator of the quality of the evaluation, inter-annotator
agreement is calculated with the kappa statistics.
− “Evaluator Trustability”. As an indicator of the reliability of evaluators, intra-annotator
agreement for each evaluator is calculated on the redundant translation units (with kappa statistic).
Furthermore, in the "review results" the number of evaluations changed by the reviewer
is
reported for each evaluator.
Depending on the role in the project, different anonymization policies are applied with
respect to evaluators:
- Admin/owner and Reviewer(s): if the "specifications" are set as "Keep evaluators anonymous", the username of
evaluators is not displayed, and an anonymous ID is presented. Note that, if necessary, admins can find the
correspondence between the anonymous ID
and the username label in the "evaluation progress" page.
- Vendors(s): can see overall evaluation results and quality but can access information about evaluator
trustability only for evaluators invited by her/him.
If desired, the evaluation report can be shared with other persons by clicking on the "Share" blue button on the top-right corner of the page.



Finally, the owner/admin can mark the task as "Complete" in order to finalize the project.
Appendix 1: Import data format
The application can import files in Tab Separated Format (TSV). Standard input files have the following characteristics:
- A table header consists of a single line of alpha-numeric column names,
- Data lines follow the header and (for the present) consist of the same number of columns as the header.
The following is the list of the columns:
id – mandatory, an alpha-numeric unique identifier;
source-language – optional, the source language;
source – mandatory, the source sentence;
comment – optional, a comment to the source sentence;
target-language – optional, the target language;
target-[label] – mandatory, the target sentence, the column name needs an alphanumeric
suffix label to specify the system name (e.g., Google, Bing, mt1, etc.);
comment-[label] – optional, a comment to the target sentence, the column name needs the same suffix label of the
corresponding target sentence;
reference-url – mandatory if specified in the evaluation format, the URL from which to download the reference
image;
reference – mandatory if specified in the evaluation format, the reference sentence.
A single file can contain multiple targets, in this case additional target-[label] and comment-[label] columns have to be added, different suffixes [label] are used to represent different MT outputs. Column order is irrelevant.
The following shows an example of a valid input file with 2 MT outputs.
