Bookkeeping 🔢¶

By default the number of input/output tokens and the model used is logged using logs/tokens-bookkeeping.log It is called in each judge implementation and if other judges are created, you should also log their token usage.

If you're using simpleval and implementing your own handlers, then it is advised to also log the token usage, as shown in simpleval/eval_sets/empty/testcases/empty/task_handler.py

The log format include the time, source (eval, handler etc), input and output tokens

time	source	model	input_tokens	output_tokens
`2025-03-19 16:11:04`	`eval`	`gpt-4.1-2025-04-14`	267	127

source is determined by the caller of the log_bookkeeping_data function.

Tip

When implementing a new testcase handler (plugin) or a new judge, you should also log the token usage.
This is done by calling log_bookkeeping_data after your call to the model.

Bookkeeping summary¶

You can use the tools/bookkeeping.py to summarize the bookkeeping data. This includes hard-coded pricing. Update as needed.