'Worse musical timing' - what does it mean?
Posted by: hungryhalibut on 30 October 2016
There have been a few mentions recently of system changes that some members say negatively impact on musical timing. One example was adding a second power supply to the NDS, and another was the Super Lumina speaker leads. I'm not trying to reopen those debates, but am intrigued by exactly what people are getting at when they say that something makes timing worse. It is simply that the members of a group, quartet or whatever don't appear to play as well together? Or is it something that can affect a solo performance? Could a singer be made to appear out of time with their own guitar? Could a solo piano piece be affected? Or am I being dim and it's something else entirely?
It's a bit more complicated than that.
The tweeter responds more quickly, but it's only reproducing the higher frequency components of the signal. These have faster rise times than the lower frequencies, so need a faster response time to be reproduced at all. However this isn't the source of group delay in this part of the audio spectrum.
Consider the crossover, this has high and low pass filters - and all filters involve phase delays. So the overall group delay vs frequency is affected by by the phase delays in the crossover, the phase delays in the drive units and any phase delays inherent in the acoustic alignment of the enclosure of the drive units. This latter point is particularly significant in respect of the bass alignment of ported or sealed box enclosures. This only considers speakers - there are also differences in amplifiers.
Thank you Huge. Having read your response a couple of times, I am starting to see the complexities and the various points in the chain from source to ear (mostly in the speaker it seems) that can affect the production, prominence and emittance of different frequencies and the timing of the emittance of different frequencies. The video is also helpful in understanding why and how timing issues might occur.
Am I right is saying that it is the speaker design (the cabinet, driver and crossover) that is mostly (but not entirely) responsible for timing issues? Or are amplifier design and room interactions equally culpable?
nigelb posted:I would assume the time it takes for a tweeter to respond with such tiny physical movements would be faster than the considerably larger movements required of a woofer.
That is quite correct, at least as a generalisation. A woofer has a bigger, heavier cone that has more inertia resistling the required acceleration, even the smaller ones that are used in multiples to move enough air. Of course the degree of it depends on the speaker design and construction, including factors like the strength of magnets, cone material, surround stiffness etc. but different in an electrostatic speaker - though they tend to have difficulty moving enough air to reproduce bass at an equitable level.
The speaker / room interaction is equally if not more important.
In my experience amplifiers are also very important; but I'm at a loss as to precisely why they are so important as, superficially, one would expect the smaller phase delays in amplifiers to be much less important. I think a possible explanation may be that the ear/brain combination is more sensitive to the effects of the non-linear transient response of amplifiers rather than the linear non-idealities in the electro-mechanical response of speakers.
Innocent Bystander posted:nigelb posted:I would assume the time it takes for a tweeter to respond with such tiny physical movements would be faster than the considerably larger movements required of a woofer.
That is quite correct, at least as a generalisation. A woofer has a bigger, heavier cone that has more inertia resistling the required acceleration, even the smaller ones that are used in multiples to move enough air. Of course the degree of it depends on the speaker design and construction, including factors like the strength of magnets, cone material, surround stiffness etc. but different in an electrostatic speaker - though they tend to have difficulty moving enough air to reproduce bass at an equitable level.
IB, whist true, you're also missing the point to some degree.
It's not about moving an amount of air, our ears respond to pressure not volume. The other thing is that the risetime of a higher frequency wave is faster than a lower frequency wave. This is the reason you need a smaller lighter faster diaphragm for the HF unit. This has no bearing on group delay vs frequency.
Huge, I wasn't suggesting the speakers are the be all and end all, but answering Nigelb's question. Yes sound is variations in pressure, but as you'll be aware in a cone loudspeaker that is effectively proportional to the volume of air pushed or pulled, which in turn is proportional to the diameter of the cone and the distance it travels, and a bigger cone is inevitable heavier than the smaller one of a tweeter, so the physics of cone speakers inherently means they are likely to cause the bottom frequencies to be slower to respond than the top, all else being equal. But I fully agree that all else is not equal, and recognise the factors you have described as being contributors, of course some more than others depending on the system.
IB, I'll show why cone area / mass are less important than it would otherwise appear.
Pressure isn't just proportional to the effective area of the diaphragm, it's also proportion to the velocity of the diaphragm. However when you look at pressure sine waves, the rate of change of pressure required at any given frequency is proportional to the first differential of the pressure (dρ / dt).
Consider a two way speaker using matched drivers with similar cone materials (both polymer or both metal)
Bass Mid effective diameter: 165mm / effective area: 27.2E-3 m²
Tweeter effective diameter: 25mm (physical dia 22mm) / effective area: 6.25E-6 m²
Area Ratio: Bass Mid = 44x Tweeter area
Diaphragm Thickness Ratio: Bass Mid = 10x Tweeter (assumed)
First taking the situation at the crossover frequency
f(Bass Mid = 2.2kHz
f(tweeter) = 2.2kHz
To generate the same pressure, the tweeter must move 44 times faster than the Bass Mid cone, but it's also 440 times lighter, so no problem.
Now taking the situation toward the upper end response of two drive units
f(Bass Mid = 1kHz
f(tweeter) = 10kHz
To generate the same pressure, the tweeter must now move 440 times faster than the Bass Mid cone, but it's also 440 times lighter, so the responsiveness in comparison to the waveform it needs to reproduce is the same as for the Bass Mid.
The other factors (on the motor side) are
Current (same each side for same amplitude as matched 'sensitivity' is assumed)
Magnetic field strength: Broadly similar but the tweeter magnet could be a bit stronger due to smaller gaps (however not vastly different as a similar level of magnet technology is assumed for each); so slightly more efficient drive for the Tweeter.
Length of voice coil wire: Longer in the Bass Mid, so a bit more efficient drive for the Bass Mid.
So in terms of tracking the signal and group delays the drive units pretty much balance out!
That's a lot of reasoning just to say "Meh, same difference!"
Oops, there's a transcription error above...
The effective area of the tweeter is 625E-6 m² (and not 6.25E-6 m² as written).
However, this is just a a transcription error, so the ratio calculation and conclusions are still correct.
One thing that occurred to me looking at the excellent video (be it with a commercial rather than educational intent) that Jan-Erik posted is that sub-optimal timing can manifest itself as poor, fuzzy or indistinct imaging, if I understand the explanation correctly. Now Naim has never been famed for the imaging qualities of it's kit (although more recent models have improved significantly) but Naim has consistently been praised for PRaT. The 'T' of course stands for timing. This argument seems a little inconsistent.
Again this argument might have something to do with our definition of imaging. To me imaging is not just the accurate placement of individual instruments and voices in the horizontal, vertical and depth planes but also the 'solidity' of those instruments and voices. Do they appear as indistinct blobs of sound or are they precisely contained and located and of a 'size' and volume that neither dominates nor is indistinct?
Do others associate poor timing with poor imaging or are they unrelated?
PS Huge, thanks for the time you have taken to detail the physics (and electronics) than partially provide some explanation.
The imaging aspect is more of an analogy than literal truth - for placement on the horizontal axis, the brain primarily uses the relative amplitude in preference to phase relationships - look up Blumlein Stereo and research papers by Alan Blumlein. Although the phase relationship does appear to be used as a secondary factor, the brain seems to give much less weight to this information.
For depth perception the situation is more complex, and I don't know of any seminal work on the subject.
Jan, I'm assuming that no subsequent work has shown Blumlein to have been fundamentally incorrect (your area of expertise not mine!).
Actually, high frequency sounds do travel faster than low frequency sounds in air.
When you hear far off thunder, you get the high frequencies first, and then the sounds get lower in pitch. The crack then the rumble if you like.
Dozey, you're not taking scattering into account.
Imaging (or more strictly - source position localisation) depends on the inter-aural time difference. Wikipedia is your friend.
Huge - maybe I am not taking scattering into account. Is that the reason?
3 primary cues for auditory localization:
1. Interaural time difference (ITD)
2. Interaural level difference (ILD)
3. Head-related transfer function (HRTF)
Yes, the long wavelengths are more penetrating over long distance (hence an elephant's infrasonic rumble or a lion's deep roar used to communicate over distance), shorter wavelengths are absorbed more quickly in the atmosphere.
As a result the higher frequencies can only reach us by more direct paths; the lower frequencies can be scattered many times over much longer paths before reaching our ears. So much of the lower frequency energy reaches us later due to the longer distance it's traveled.
Thanks Huge. Makes sense.
Dozey posted:3 primary cues for auditory localization:
1. Interaural time difference (ITD)
2. Interaural level difference (ILD)
3. Head-related transfer function (HRTF)
If that were the order of significance, then an horizontally spaced pair of omnidirectional mikes would give a better stereo image than a Blumlein pair or a coincident crossed cardioid pair.
It is indeed the order of significance according to modern psychoacoustics. I have drafted a number of patent applications in this technical field when I worked for EMI.
I don't know what you would get with two Omni mikes spaced a head width apart, but if you use two microphones in the ear canals of a dummy head you get good results because of the three cues I mention above..
That's very interesting. I'm well aware of the properties of a Neumann Head.
The spaced pair of omnis (spaced a bit wider than the interaural distance) would give correct phase information (ITD) but no volume difference (ILD) and it would appear that this should therefore give the most stable and clearest stereo image (with the possible exception of a Neumann Head). For some reason this arrangement doesn't seem to work in practice; instead a better stereo image comes from Blumlien pairs or cross cardioids where the ILD is preserved and the ITD is lost.
If the phase information is the priority I'm at a loss to know why this should be. Any suggestions? I'm fresh out of ideas here. ![]()
No good idea I am afraid. I will need to look at how Blumlein pairs are supposed to work again. Perhaps the phase information is not actually lost but manifests itself in a different way.
However I do have a nugget of information regarding ILD.
For distant sources there should be no appreciable ILD between the right and left ears. After all, the distance to the source is almost the same for both ears. It only becomes important when the distance to the source is of the order of a few head widths. So we used to use a non-zero ILD when we were wanting to give the impression of someone whispering in your ear or for a fly or bullet passing close to the head.
OK - ITD is important for low frequency sounds. ILD is more important above 1.5kHz - but that is because it is the shadowing of the head causing the ILD.
Presumably in Blumlein pairs there is no head, so you should not actually get any real ILD. Presumably it is the directionality of the mics which can mimic the effect of head shadowing? Omni mics would give no ILD.
OK, interesting, that makes a good deal of sense. Yet again it's more complicated than it appears at first glance!
With Blumlein pairs it's the 90° crossed 'figure 8' response patterns that simulates the ILD for each channel; for crossed cardioids an angle of about 120° is normally used to simulate the ILD based on their particular directional response pattern. The different arrangement of these arrays gives a different overall front / back response pattern and that gives a different perception of the recorded 'acoustic environment' in which the signal appears.
Correct: Non coincident omnis would give no simulation of ILD between the channels (but the difference in position would give a simulation of the ITD, dependant on the separation used).
To me it is simple and essential. Being firmly in the traditional "Naim sound" camp, I feel it is vital the way the music integrates together for a musical performance, thus the timing of the music seems separated and clinical, the musicians are not playing together but sterile entities.
So, where did our exquisite sense of timing come from?
Like most of our talents, it was a survival mechanism honed during the times when we were not top predator. Seeking shelter in caves, we became acutely tuned to sounds, gradually refining our abilities to discriminate the direct from the reflected. Some became more adept at 'reading' the soundscape than others. These were the first acousticians, although the term had not yet been invented.
Soon, groups of like-minded hominids began to gather in caves, banging together sticks, bones, rocks, and (possibly) human flesh, to practice their art and collectively refine their abilities, i.e., discrimination of inter-aural time, frequency and intensity differences, (the head-related transfer function was well understood, although not referred to as such at the time).
Later, well coordinated attacks of sabre-toothed tigers wiped out the less sonically inclined humans, who realized too late that death was imminent. Meanwhile, those with the 'acoustician' trait escaped.
As most of the survivors carried the trait, it established itself firmly in the gene pool. Humanity entered a period of relative safety from predation, but the acousticians were bored.
Soon, groups of acousticians began to gather in caves, musing on ways of using sticks, hollow bones, rocks, and dried skins taut on halved coconut shells - along with bits of stretched animal gut - to practice their art and collectively refine their abilities. These musing acousticians were highly intolerant of anyone who couldn't keep time, and summarily ejected them. The latter became audio reviewers.
Of course, the musing acousticians with superior timing abilities were highly sought after by the females of the species, who knew a good thing when they heard it. This hard-wired attraction can still be found today:

(thanks to Tobyjug for the image)