First, we reduce the amount of information presented to the Expert system too show that end-task accuracy is correlated with the amount of information available (as shown in the Non-Interactive Setups to the left of dotted vertical separator line).
Then, we provide the Initial information to the expert system give it the option to ask follow-up questions in an interactive manner (BASIC-Interactive). The model's performance drops from when given the same Initial information in the non-interactive setup (Initial Non-Interactive). This indicates that adapting LLMs to interactive information-seeking settings is nontrivial.
Finally, we show that our BEST setup, which incorporates explicit clinical reasoning and more accurate confidence estimation, effectively seeks additional information and improves performance.