Enrique Amigó will explores a comprehensive taxonomy of evaluation dimensions for large language models, emphasizing the need to consider aspects like harmful content, explainability, hallucination, informativeness, and reasoning capabilities alongside traditional effectiveness benchmarks.