"The quality of human-system interactions is a key determinant of mission success for military systems. However, operational testers rarely approach the evaluation of human-system interactions with the same rigor that they approach the evaluation of physical system requirements, such as miss distance or interoperability. Often, testers evaluate human-system interactions solely using survey instruments (e.g., NASA-Task Load Index (NASA-TLX)), excluding other methods entirely. In this paper, we argue that a multi-method approach that leverages methodological triangulation provides greater insights into human-system interactions observed during operational testing. Specifically, we present data from an operational test in which a multi-method approach was used. Ten attack helicopter pilots identified and responded to threats under four conditions: high vs. low threat density and presence vs. absence of a threat detection technology. Testers recorded two primary measures of pilot workload: time to detect first threat and the NASA-TLX. Pilots took significantly longer to detect threats under low threat density than high threat density when the threat detection technology was absent. However, there was no difference in time to detect threats when the threat detection technology was present. The NASA-TLX data showed a similar pattern of results, suggesting that the observed effect is a result of pilot workload rather than the method used to measure workload – i.e., survey instrument vs. behavioral metric. Triangulating methods in this way provides a more rigorous and defensible test of the research question, and when combined with qualitative methods, provides useful information for identifying whether degradations in performance should be addressed through additional training or interface redesign. "