This document discusses challenges in evaluating heterogeneous information access systems and proposes areas for future research. It notes that traditional IR evaluation focuses on system-oriented metrics using test collections, while heterogeneous search involves more complex user behaviors like non-linear browsing. Key challenges include accounting for diverse search tasks, coherence, diversity, personalization and different result presentation strategies. The document advocates better understanding user behavior through models and applying this understanding to improve evaluation metrics for heterogeneous search systems. It identifies areas for future work such as modeling task complexity, coherence and click patterns to develop more powerful evaluation frameworks.