Working on exploiting audio waveforms or spectrograms, I am thinking of trying Monty approach for audio processing. My blocking point is the definition of reference frame for audio (as it is a temporal and not spatial signal). I struggle to understand how to give the reference frame information. My first thought were to give it points were salient freq. or magnitude occur but I am not very comfy. Any thughts ?
Hi @_sgrand
We are working on modeling sequences (here is a recent object behavior research meeting video). For now, the only way we could think of is define a 1D reference frame where time is the only dimension and it can just be traversed in one direction. But we I think the object behavior mechanism will be much better. Watch this space!
Here are some random thoughts on the topic; no extra charge…
-
Timing (e.g., impulse, phase) information for an event can be used to determine the location of the sound’s origin. For example (AIUI), normal human hearing can pinpoint the location of a coin striking a window within about a foot at a distance of 30’.
-
The head’s position can be changed (e.g., tilted, turned) to test hypotheses about a sound’s location. Some animals (e.g., cats, horses, some dogs) can also swivel their ears to localize sounds.
-
The sensing nerve cells in the cochlea are spread out in a manner similar to touch sensors on a fingertip. I wonder whether and how the low-level sensory processing might take advantage of this.
-
As a whole, the ear operates a bit like a (somewhat) directional microphone, feeding a spectral analysis subsystem. That is, the brain does not receive sounds simply as changing amplitude over time.