Bits more important than kHz?

Posted by: Ebor on 14 September 2013

I was lucky enough to be singing on a professional recording about 18 months ago and, being the good hi-fi geek I am, managed to catch a quick chat with the engineers during a tea break. The 6 microphone signals were being recorded on ProTools on a Mac, and I asked what resolution they were using: the answer was 24-bit and 48kHz. When I asked why they weren't using a higher sampling rate, they said it was pointless due to the Nyquist/upper limit of human hearing argument. He said that it was worth using a higher bitrate than CD on the same basis that, in the olden days of tape, they would record on reel-to-reel with a much higher S/N ratio than the eventual consumer release format. The extra resolution was worth it to be on the safe side, I suppose you might say.

As a 16/44.1/CD Luddite, I have no axe to grind on this, just reporting the words of a professional who has been recording for, well long enough to remember the days of analogue tape.

Mark

Apologies if this is too much of an advert, but the recording in question was called Catholic Collection III (not our idea) on Herald AV. I can't make any impartial claims for the quality of the singing, but the Abbey acoustic is wonderful and captured very well.

Posted on: 18 September 2013 by jfritzen

Here is another interesting blog post IMO:

http://recordinghacks.com/arti...e-world-beyond-20khz

Some quotes from Mr Blackmer's article:

Many engineers have been trained to believe that human hearing receives no meaningful input from frequency components above 20kHz. I have read many irate letters from such engineers insisting that information above 20kHz is clearly useless ...

Human hearing is generally, I believe, misunderstood to be primarily a frequency analysis system. The prevalent model of human hearing presumes that auditory perception is based on the brain’s interpretation of the outputs of a frequency analysis system ...

and finally (emphasis by me)

The human hearing system uses waveform as well as frequency to analyze signals. It is important to maintain accurate waveform up to the highest frequency region with accurate reproduction of details down to 5µs to 10µs.

Posted on: 18 September 2013 by Simon-in-Suffolk

IOJifritzen, I am fascinated by your distinction between frequency and wave form. Unless you are talking about a pure sinewave, then all waveforms are made up of a combination of frequencies or if you like sine waves. So the shape of a waveform over a time period is governed by its fundamental frequency and its harmonics. (of sine waves)

And of course our ears and brains do lots of clever things in terms of audio signal processing and spatial analysis. However unless you are talking subsonic the audio frequencies and their harmonics are detected by the audio wave nodes vibrating tiny hairs in our cochlea in from the ear drum. If we can't hear above certain frequencies, ie the nerves in our cochlea cant resolve, then we can't detect that part of the waveform, and so our brains hearing analysis system can't get that input.

Now of course impulses can only be theoretically defined, as they contain an infinite number of frequencies, which of course is impossible. In the real world an impulse is 'filtered' by the response of a system into specific set of frequencies which we may be able to hear.

So I maintain if we can't hear or feel the frequency components of a waveform, then the part of the waveform shape determined by these frequencies is invisible to us. There is nothing stopping our brains imagining missing information to make sense of it, but we are not directly 'hearing' it.

Simon

Posted on: 18 September 2013 by Aleg

S-i-S

what is meant in the article is that people can discern sounds that are only about 6 microseconds apart. According to the article the consequence of this capability is that sample frequency should be above 166 kHz to maintain the transparency of the original sound and avoid sound smearing which was not in the original sound.

it has to do with just the timing aspect and not with sound frequency represented by that number.

the test was done to generate a 7 kHz sound pukse and listen to that through two speakers. Then they moved one of the speakers by milimeters at a time and check when the person (doing the test blinded) could differentiate between the sound coming from each speaker.

cheers

Aleg

Posted on: 18 September 2013 by Simon-in-Suffolk

Aleg, unless I have missed it, can you post the link.

Reading your description I wonder if we are not therefore talking about phase differences between our ears.. and indeed that is part of our spatial awareness.. But I suspect that is different to discerning the or 'shape' of the sound which will be determined from its frequency make up.

So yes i can potentially see that phase between two stereo signals requires a high bandwidth recording it in term of their relative timing, as opposed to the harmonic content of the sound itself. Therefore talk of extended bandwidth amplification and tweeters is irrelevant, as we are talking about phase or relative timing difference between the two stereo channels.

Im sure this wouldn't work in mono, as we would be back to my earlier point.

Simon

Posted on: 18 September 2013 by Aleg

S-i-S

it is the previously mentioned article http://www.physics.sc.edu/~kunchur/papers/HIFI-Critic-article-by-George-Foster.pdf

Page 3 paragraph

"Audibility of temporal smearing and time misalignment of acoustic signals (2007)"

cheers

aleg

Posted on: 18 September 2013 by Simon-in-Suffolk

Ta!

Posted on: 18 September 2013 by m0omo0

May I ask a very dumb question, as I've not researched the subject and it is not my field at all, but is the whole sound reproduction chain able to preserve -- or even acquire in the first place -- such 5-6 microseconds transients ?

I have no doubt about the digital part, but how about the analog and electro-mechanical parts ? Is a professional microphone able to catch that, and is a loudspeaker driver fast enough ? And what does it mean for the whole chain of analog electronics components and analog signal processing in between ? (I guess that at least some components are able to deal with 167 kHz electrical variations, otherwise we wouldn't ever have had analog radio, but it's a special case for specialized parts in the tuner.)

Posted on: 18 September 2013 by jfritzen

Originally Posted by Simon-in-Suffolk:

I think I know Fourier analysis quite well. But Fourier analysis is a linear theory, which means a (probably) non-linear operation like audio perception is not interchangeable with a linear combination of individual frequencies. Example: A neural network may fire if a certain signal level of the waveform is reached. The signal can be represented as a sum of individual frequency components with phase a la Fourier. But none of the frequency components alone may be large enough (amplitude-wise) to trigger the neural net, only the (phase correct) summation of them. Or as a "formula"

NeuralNetworkResponse(Sum of sine waves with phase) ≠

Sum of ((NeuralNetworkResponses(of each sine wave)) with phase)

So detecting waveforms with a very non-linear neural network is probably something very different from linear Fourier frequency analysis. Which IMO is precisely the point that Mr. Blackmer meant when he wrote about waveform detection additionally to frequency analysis.

Jochen

Posted on: 18 September 2013 by Simon-in-Suffolk

M0omo0, the key point in this paper that i can see is the timing difference between the stereo channels, rather than the absolute frequency response of an individual channel.

Simon

Posted on: 18 September 2013 by Simon-in-Suffolk

Jochen, I'm sorry I' just don't get what you are you trying to say. I spent a lot of time at university on Fourier analysis and (audio) wave form composition, and wrote a disituation and developed training software for wave form generation out of fundamental and harmonic composition, so I feel everso slightly knowledgable on the subject.

So I just don't see how a waveform can exist with out its composition of the fundamental and series of harmonic frequencies at varying amplitudes. It is such a fundamental principle to vast chunks of our 21st century maths in signal prcessing in most of walks of life.

But i am always interested in new concepts that challange established engineering and scientific principles of today. Please feel free to post a quote to a book or paper on the alternate principles.

The link you posted above that Aleg kindly relayed to me is very interesting but is about empirical testing of spatial timing and phase of stereo signals to a group of testers .. As well as some discussion on 'smearing' through filtering or linearly or non linearly transforming a signal through electronics. This is quite a different from challanging Fourier frequency analysis from what I can see.

Simon

Posted on: 18 September 2013 by m0omo0

Simon,

You're right if you're talking about Milind Kuchur's first paper as related by George Foster in HifiCritic, but I think Milind Kuchur's second paper (the one where he used a low-pass filter) is not about temporal resolution between left-right channels.

Or I didn't get it (which is very probable).

Maurice

EDIT: This is a response to your response to me, not to your response to Jochen !

Posted on: 18 September 2013 by jfritzen

Originally Posted by Simon-in-Suffolk:

So I just don't see how a waveform can exist with out its composition of the fundamental and series of harmonic frequencies at varying amplitudes.

Sorry for the misunderstanding, that is not what I intended to dispute. Every signal can of course be represented as a linear Fourier sum. But perception probably involves neural networks one way or the other and neural networks can trigger in a non-linear way. So the 20kHz sine wave that triggers nothing in your ear (or mine) when played alone, may have a noticeable effect when combined with other frequencies and correct phase in form of a sharp pulse e.g.

And now I'm going to soothe my neural networks with a glass of port and some hi-res music.

Posted on: 18 September 2013 by Jasonf

Originally Posted by jfritzen:

Originally Posted by Simon-in-Suffolk:

So I just don't see how a waveform can exist with out its composition of the fundamental and series of harmonic frequencies at varying amplitudes.

Fritz, what Port do you intend to open?

Jason.

Posted on: 18 September 2013 by jfritzen

Originally Posted by Jasonf:

Fritz, what Port do you intend to open?

Jason.

The label says Ramos Pinto Porto Ruby

Posted on: 18 September 2013 by Jan-Erik Nordoen

Originally Posted by m0omo0:

At acquisition (recording), yes, (see the Yamaha technical paper quoted by Jochen further up in the thread for what is required to do this in the digital domain).

At the electromechanical end, it's more difficult due to inertia and other issues, but achievable depending on type of speaker (discussed, IIRC in the Yamaha paper)

Posted on: 18 September 2013 by Jasonf

Originally Posted by jfritzen:

Originally Posted by Jasonf:

Fritz, what Port do you intend to open?

Jason.

The label says Ramos Pinto Porto Ruby

Ramos Pinto, very nice.

If you have not yet tried, I recommend a Tawny over the Ruby. The Ruby is young, fruity and light, good as a summer port. The Tawny is lighter in colour but more mature and complex in taste. And of course, if you can get hold of some Stilton cheese, they are a perfect combination.

Anyway, enjoy that port Fritz!

Jason.

Posted on: 18 September 2013 by jfritzen

Originally Posted by Jasonf:

Originally Posted by jfritzen:

Originally Posted by Jasonf:

Fritz, what Port do you intend to open?

Jason.

The label says Ramos Pinto Porto Ruby

Ramos Pinto, very nice.

If you have not yet tried, I recommend a Tawny over the Ruby.

I'll give it a try next time .

Posted on: 18 September 2013 by m0omo0

Thank you Jan-Erik.

You're kind with me, as my question was even dumber, being publicly obvious now that I didn't read that paper...

Guilty of laziness, Your Honour.

Posted on: 18 September 2013 by Aleg

Originally Posted by Simon-in-Suffolk:

But i am always interested in new concepts that challange established engineering and scientific principles of today. Please feel free to post a quote to a book or paper on the alternate principles.

Simon

My point however is that if people can differentiate between sounds with a phase difference of 6 microseconds, then when one compares live music with playback of the recorded music, people can also hear differences as small as 6 microseconds.

IMHO the conclusion should then be that recorded music should be sampled above 166 kHz to not have differences due to this smearing. All else being equal in recorded music as in live music, which is debatable as well If that could ever be obtained.

Cheers

Aleg

Posted on: 18 September 2013 by Simon-in-Suffolk

Aleg I concur, indeed there would appear to be that argument. But I guess what might be confusing to some is that doesn't ultimately define the frequency response of a single channel/speaker in the playback system.

Ie the sample rate is to determine the frequency relationship between the channels and not of the channel itself Which means as i see it this is not a about waveform shape or frequency response of a single channel but the spatial interpretation of our brain of the phasing or time difference of two waveforms hitting our two ears as part of the stereo field.

As a follow on in the real world, I wonder how many audio rooms and setups allow for this spatial information to be presented without acoustic stereo imaging smearing from the reflections of the room itself. I suspect not many.

Also surely this information is only truly meaningful from recording a stereo image with at least two seperataed microphones per channel, rather than single microphone channel such as vocal put into the mix and perhaps stereo reverbed. So to take advantage of this surely there would be constraints on how the material was spatially recorded, even if we had the inter channel bandwidth - which of course as you say is debatable.

Simon

Posted on: 18 September 2013 by Simon-in-Suffolk

Jochen, not that I have done this yet, but digesting the various theories and tests performed, I am confident that if you looked at the sample by sample difference between the two channels, you would be able to see the spectral information derived from spatial or inter channel timing information. Ie use Fourier analysis to identify the fundamentals and their harmonics to define this spatial information.

Given the discussion to date of a high bandwidth live stereo recording i would expect to see a degree of high frequency energy between the two channels that may not be present in the single channel itself.

If one had a suitable hidef stereo file it would be interesting to spectrally analyze it.

Simon

Posted on: 19 September 2013 by m0omo0

Simon,

Sorry if I'm being thick, but at least in the second paper described by George Foster -- Milind N. Kunchur, Probing the temporal resolution and bandwidth of human hearing, Proc. of Meetings on Acoustics, Vol. 2, 2008, which you can download here --, I don't see that the findings of experiments 1 and 2 are about spatial information. And there's also a technical explanation why ultrasonics matter.

All three papers can be downloaded here, and there's also a FAQ.

Maurice

Posted on: 19 September 2013 by Simon-in-Suffolk

Maurice, thanks, yes I was only seeing one half of the argument.

I have downloaded the PDFs... However good as Kunchar 2008 is it leaves me with more questions and answers.

Luckily I have found a paper by van Maanen from the 51st AES Conference, Helsinki, August 2013 which deals with this very subject with regard to the audio reproduction in a very engineering, open and balanced way. So far it has discussed the temporal response and frequency response of our ears and audio replay systems... And the responses in linear and non linear system to impulse functions and their relationship with respect to the frequency respones and smearing...

I'll post more later.

Guys thanks for the heads up. This is a fascinating subject that seems to be getting some increased interest in the industry.

Simon

Posted on: 19 September 2013 by m0omo0

Originally Posted by Simon-in-Suffolk:

Maurice.. That's better thank you. I have downloaded the pdfs, and Kunchur 2008 is interesting..as I think his main premise is looking at the non linear transformation of ultrasound frequencies within our ear to produce non linear products. This in his tests appear detectable but because they are non linear they are not reversably deterministic, as more than one set of inputs (frequencies) can produce a given or similar response. Therefore my understanding of this is that we can differentiate or detect a presence but not neccessarily be able to analyse the spectral make up of a sound whose energy is above our frequency or pitch cutoff. It's kind of like being colour blind.. Not being able to see the difference between red and green, but knowing its one of those colours and not black or white. [...]

I think I understand what you're saying Simon, but not why it matters to music reproduction. When, let's say, a drummer plays, he produces over the time of his playing a complex soundwave that may comprises ultrasonic frequencies. According to Kunchur, these are important, even if not -- literally -- heard, because it produces audible effects at different audible frequencies (through the non-linear mechanism you're referring to). Why is the fact that this mechanism is non-deterministic important ? The drummer only played this particular part at this particular time ! Even if different ultrasonic frequencies could be produced to give the same audible effect, it couldn't be done by this drummer playing that part at that moment. Plus, these non-linear effects help resolve temporal alterations in the otherwise audible signal, they're not alone.

Isn't the down-to-earth conclusion here that, regarding digital audio, we would need a sampling rate superior to (at least) 167 kHz for two linked reasons:

1. We need these ultrasonics frequencies correctly reproduced to be able to perceive...

2. ...5-6 microseconds-short temporal alterations.

PS: Regarding 1., that means that you were right in your first response to me far above (the joke about PRaT), Jochen. Apologies.

EDIT: Ah now Simon, you've changed your post in the meantime !

Posted on: 19 September 2013 by Simon-in-Suffolk

Maurice, indeed. Van Maanens paper is strongly influenced by Kunchur, but he does go onto explicitly suggest that human hearing falls into two (at least) distinct categories.

Frequency bandwidth
Temporal bandwidth

It appears both are used to extract information from the audio.

It is suggested out temporal bandwidth cutoff is around 5uS and we know about frequency response.

He also suggests that our temporal bandwidth cutoff is far less impacted with age compared to frequency cutoff.

Now although this is an area that requires much new research, he suggests that our human temporal resolution requires a cut off slope of 2dB / μS. This equates to approx 85kHz low pass 4th order Butterworth filter or better. Therefore you can see how 192kHz sampling rate fits in....as a minimum...

My eyes have been opened to audio reproduction for humans should concentrate on temporal as much as frequency.. Or arguably because of our ageing process focus more on temporal.

Now this is where the impulse comes in. All systems respond to an impulse in a way determined by their frequency response. A lower frequency response will have a shallower slope response. Where these slopes have not sufficiently fallen away between the temporal transients, our brains (and systems) can't differentiate between them, and the transient response is smeared with others.

The sibilence in choral music is a good example of this.

So I do now see that a HF tweeter with a high bandwidth upto 85kHz or better will potentially sound more natural to us... Not because of its bandwidth response to a continuous signal, but because of it's temporal response that our brain is sensitive to.

Well I have learned something new from the forum.. Good stuff.

Simon