To create a unified description of reality, our perceptual systems must coordinate information received through the various senses. This multisensory input creates a cross-modal binding problem. For example, we might question: How does the nervous system automatically associate a visual collision with a crashing noise?
Previous research suggests that when the senses deliver conflicting information, the sense that "wins" depends on the task at hand: vision dominates spatial processing whereas audition dominates temporal processing. In this talk, I'll explore a new idea regarding multisensory processing: that this sensory specialization results in cross-modal encoding of unisensory input into the task-appropriate modality. In a series of psychophysical experiments, I investigated whether visually portrayed temporal sequences ("rhythm") become encoded in the auditory domain. The results imply that the perceptual system automatically and obligatorily abstracts temporal information from its visual form and represents this structure using an auditory code. The consequence is an experience of "hearing visual rhythms." Implications for cross-modal binding, as well as the binding problem more generally, will be discussed.