Bookkeeping 🔢¶
By default the number of input/output tokens and the model used is logged using logs/tokens-bookkeeping.log
It is called in each judge implementation and if other judges are created, you should also log their token usage.
If you're using simpleval and implementing your own handlers, then it is advised to also log the token usage,
as shown in simpleval/eval_sets/empty/testcases/empty/task_handler.py
The log format include the time, source (eval, handler etc), input and output tokens
time | source | model | input_tokens | output_tokens |
---|---|---|---|---|
2025-03-19 16:11:04 |
eval |
gpt-4.1-2025-04-14 |
267 | 127 |
source
is determined by the caller of the log_bookkeeping_data
function.
Tip
When implementing a new testcase handler (plugin) or a new judge, you should also log the token usage.
This is done by calling log_bookkeeping_data
after your call to the model.
Bookkeeping summary¶
You can use the tools/bookkeeping.py
to summarize the bookkeeping data.
This includes hard-coded pricing.
Update as needed.