The assessor calibration algorithm is available as a free download for Microsoft Excel and Python and via a Graphical User Interface (GUI).

For more details on the algorithm please visit the Model page or download a version of the latest paper on arXiv which also includes several case studies and supplementary information.

Once you have tested the packages we would be very grateful if you could please provide feedback on the packages that you have used so that we can test and improve our methods.

## Microsoft Excel Package

The spreadsheet has been designed and tested on Windows XP and Windows 7 using Excel 2007 and 2010. If you are using other versions and experience difficulties please let us know. Please note that macros do not function Mac Office 2008.

Robustness is currently disabled in this version.

Detailed instructions are shown below.

## Python 2.7 Package

The python package includes all of the features of the Excel package plus a graph generator. It has been designed and tested in Python 2.7 on Ubuntu. If you are running Python 2.7 on other platforms and experience difficulties please let us know.

Detailed instructions are shown below.

## GUI Package

This package has a graphical user interface and does not require Microsoft Excel or Python to run. It includes all of the functionality of the other packages including the graph generator. The executable files may be downloaded here for the different platforms.

Detailed instructions are shown below.

Disclaimer: The Calibrate with Confidence team and associated groups/individuals take no responsibility for the loss of data, errors in the model or its implementation or the outcomes resulting from using the model.

## Microsoft Excel Instructions

Figure 1 shows the template spreadsheet with example data.

There are four input columns:

• Column A: Object Labels $$(o)$$ - e.g. students.
• Column B: Assessor Labels $$(a)$$ - e.g. teachers.
• Column C: Scores $$(s)$$ - indicates the assessor's evaluation of an object. Should be a real number.
• Column D: Confidences / Std Dev $$(c)$$ - indicates an assessor's confidence or standard deviation in the evaluation. Confidences can be a positive real number or H (High), M (Medium) or L (Low). However, all entires should be of the same type i.e. all H/M/L or all positive real numbers. High, medium and low confidence levels equate to {$$\lambda^2$$, 1, $$\lambda^{-2}$$} where the Confidence Factor, $$(\lambda)$$, can be set within the range [1, $$\infty$$); by default this is set to 1.75. If standard deviations are entered one must select the appropriate option in the Data Entry Window (Figure 2) to ensure that the algorithm converts them to correctly.

These columns be replaced with your own data. (Please note that data will only be selected down to the first empty row).

The number of evaluations must be greater than the number of objects + number of assessors + 1.

There are eight output columns:

• Column F: Unique Object labels $$(o)$$ - 15 unique objects labelled 1-15.
• Column G: Object 'true' values $$(v_o)$$ - the unbiased object values.
• Column H: Object 'true' values robustness $$(\left\vert \delta v_o \right\vert)$$ - the maximum error in the unbiased object values.
• Column I: Object total confidence $$(C_o)$$ - the total confidence in the object.
• Column J: Unique Assessor labels $$(a)$$ - 8 unique assessors labelled A-H.
• Column K: Assessor Biases $$(b_a)$$ - the estimated assessor bias.
• Column L: Assessor bias robustness $$(\left\vert \delta b_a \right\vert)$$ - the maximum error in the assessor biases.
• Column M: Assessor total confidence $$(C_a)$$ - total confidence of the assessor.

The confidence weighted rms |dv,db| is displayed in Cell "O38".

To start the macro you can click the ‘Calibrate with Confidence’ button. This brings up the Data Entry Window (Figure 2) which will allow you to enter the following information:

• Input Data Columns (Blue Frame) - here you can enter the columns which represent the objects, assessors, scores and confidences/std dev. You will also need to specify an output column. This is the first column in which the results will be shown; in the example we chose Column F. The output will span 8 columns and these are described above. (Please ensure there is no important information in these columns as the information may be lost).
• Confidence / Std Dev Settings (Green Frame) - here you can specify if confidences or standard deviations have been used. If confidences are used one may also set the Confidence Factor, $$\lambda$$, which governs the weighting for high/medium/low confidences. $$\lambda$$ can be set in the range [1, $$\infty$$); by default it is set to 1.75.
• Degeneracy Breaking Condition (Red Frame) - this may be used to set a weighted $$\left(\sum_a C'_a b_a = 0 \right)$$ or simple $$\left( \sum_a b_a = 0 \right)$$ degeneracy-breaking condition.

Clicking 'OK' will execute the macro whilst clicking 'Cancel' will exit the Data Entry Window.

# Python Instructions

This script takes an input file (either Excel or .csv file and outputs an excel file):

• Column 1 should list the objects being assessed
• Column 2 should list the assessors
• Column 3 should give the assessor's evaluation of an object. Should be a real number.
• Column 4 should give an assessor's confidence or standard deviation of the evaluation. It can be a positive real number for a confidence or standard deviation. Alternatively confidences may be entered as H (High), M (Medium) or L (Low). However, all entires should be of the same type i.e. all H/M/L or all positive real numbers. High, medium and low confidence levels equate to {$$\lambda^2$$, 1, $$\lambda^{-2}$$} where the Confidence Factor, $$(\lambda)$$, can be set within the range [1, $$\infty$$); by default this is set to 1.75. If standard deviations are used one must also set conf=std to ensure that the algorithm converts them to correctly.

The script outputs three files:

• "results.xlsx"
• Sheet Objects, Column A gives the object ids
• Sheet Objects, Column B gives the 'true' values for each object
• Sheet Objects, Column C gives the robustness measure for the 'true' values for each object
• Sheet Objects, Column D gives the total confidence in the object
• Sheet Assessors, Column A gives the assessor ids
• Sheet Assessors, Column B gives the bias values for each assessor
• Sheet Assessors, Column C gives the robustness measure for bias estimate
• Sheet Assessors, Column D gives the total confidence of the assessor
• "output_assessor_object_graph.jpg" - network graph showing the relationship between assessors and objects
• "output_assessor_graph.jpg" - network graph showing the relationship between assessors

The confidence weighted rms |dv,db| is printed on the screen.

To excute the script, at the terminal window simply enter:

python main.py "inputfile.xls"

where "inputfile.xls" is the file name of the data to be calibrated; csv files are also accepted. Please ensure the file is in the same folder as the script or enter the full path name.

# GUI Instructions

To run the GUI package simply download and install the file for your required platform i.e. Mac or Windows. Running the executable will then bring up the GUI (Figure 3) where you can upload your data, define the settings and set the output path.

• 'Choose input file...' - here you can select the file that contains your data to calibrate.
• 'Save output file...' - here you can choose a file to output your results.
• 'Confidence specified as...' - here you can specify if confidences or standard deviations have been used. If confidences are used one may also set the Confidence Factor, $$\lambda$$, which governs the weighting for high/medium/low confidences. $$\lambda$$ can be set in the range [1, $$\infty$$); by default it is set to 1.75.
• Degeneracy-breaking condition - this may be used to set a weighted $$\left(\sum_a C'_a b_a = 0 \right)$$ or simple $$\left( \sum_a b_a = 0 \right)$$ degeneracy-breaking condition.
• Generate graphs - this allows you to output the assessor and object graphs

Clicking 'Calibrate' will start the calibration.

This package takes an input file (either Excel or .csv file and outputs an excel file):

• Column 1 should list the objects being assessed
• Column 2 should list the assessors
• Column 3 should give the assessor's evaluation of an object. Should be a real number.
• Column 4 should give an assessor's confidence or standard deviation of the evaluation. It can be a positive real number for a confidence or standard deviation. Alternatively confidences may be entered as H (High), M (Medium) or L (Low). However, all entires should be of the same type i.e. all H/M/L or all positive real numbers. High, medium and low confidence levels equate to {$$\lambda^2$$, 1, $$\lambda^{-2}$$} where the Confidence Factor, $$(\lambda)$$, can be set within the range [1, $$\infty$$); by default this is set to 1.75. If standard deviations are used one must also set conf=std to ensure that the algorithm converts them to correctly.

The script outputs three files:

• "results.xlsx"
• Sheet Objects, Column A gives the object ids
• Sheet Objects, Column B gives the 'true' values for each object
• Sheet Objects, Column C gives the robustness measure for the 'true' values for each object
• Sheet Objects, Column D gives the total confidence in the object
• Sheet Assessors, Column A gives the assessor ids
• Sheet Assessors, Column B gives the bias values for each assessor
• Sheet Assessors, Column C gives the robustness measure for bias estimate
• Sheet Assessors, Column D gives the total confidence of the assessor
• "output_assessor_object_graph.jpg" - network graph showing the relationship between assessors and objects
• "output_assessor_graph.jpg" - network graph showing the relationship between assessors

The confidence weighted rms |dv,db| is printed on the screen.