CD and High Resolution audio question.

Posted by: AussieSteve on 03 March 2017

I just read an article online from a site called Mojo Audio, in which they explain what they say are the facts regarding red book and higher resolution formats. One page entitled "DSD vs PCM : Myth vs Truth", and the other, "The 24-Bit Delusion". I have no idea at all about digital music, so when I typed into my laptop a question about it in the hope of learning a bit at a level I kinda, repeat kinda understand, I came across their website and the above information. May I ask those of you who understand this stuff to have a look and let me know if it's accurate, and what your opinions might be. This website is a company which make hifi kit so I have no idea if it's factual or spin. The reason for my search was to try and understand if a 16/44.1 cd can get "upgraded" to 20 or 24 bit using the original 16 bit cd to start with? I think it means the bit rate is the number of "slices" the music is broken up into, the sampling rate is the frequency at which the "slices" are read. I vaguely get that the further or higher in frequency, the further the signal is away from our ears hearing the process? What that means in real terms electronically is beyond me. I haven't a clue about how electrons flow to make hifi parts or music so I'll leave that to you guys. All I know is I have a CD5XS which when played with my few HDCD's the sound quality over the others is incredible. I read the comments here on the forum about DAC vs DAC ect and it excites me immensely and offers hope that my cd's will sound much better overall but haven't purchased one yet because it seems as though they are in a transition period and I cannot afford to buy one now only to have it become obsolete in a couple of years. I really love the idea of a NAS storage and remote i-pad type control of my music but I haven't a clue about all the jargon used here about the connection required ect. I'm willing to learn for sure, but really haven't a clue about what it all means. I am most grateful if you would help me, Regards Steve

Posted on: 04 March 2017 by Bert Schurink

I am not technical enough to evaluate if what they claim is correct. However I have been sitting in A-B comparison workshops and always have heard the difference between the different formats. Are we able to hear everything then, perhaps not, but it's significant enough for me to buy the highest possible resolution.

Posted on: 04 March 2017 by Simon-in-Suffolk

Ok, I read the 24 bit illusion article, and it appears factual enough although the summary comes across as a bit of a whinge and is subjective.... I agree 24 bit audio in the final replay stage is really superfluous and unnecessary given the limit of our audio reproduction and recording equipment and more relevantly our hearing. Having high sample word lengths for processing and filtering is advantageous however in DSP, but then quantise down for physical conversion with the DACs. I was reading an article from the AES recently and it said currently the best dynamic range from a microphone currently is just over 17 bits of range.

The bit I didn't see which favours higher sample rates (as opposed to sample size described above) is the ability to resolve timing information between sound as opposed to pitch. Our pitch is limited and degrades with age.. our ability to differentiate timing down to a few uS largely remains with us. If one is to capture this digitally one needs a high sample rate... and it is that that can't start to make sounds become more real like an orchestra or large choir.

Posted on: 04 March 2017 by Huge

Upsampling audio data in files achieves nothing.

Taking 16 bit audio and extending to 24bit just adds zeros at the end of each digital word; if you play a 16 bit file on a device with a 24 bit DAC that's what the DAC does anyway. *

Taking CD audio and extending 88kHz or 176kHz just adds extra digital word in between the others; this is what an oversampling DAC does anyway. *

* N.B. these are simplified explanations, in fact the hardware in the player is usually a bit more subtle and can optimise itself to play back 16/44.1 files better than would otherwise be. If you change the file to HiRes, then you still have 16/44.1 sized actual data in the file (just encoded using more and bigger numbers) but that stops the hardware from optimising itself because it thinks the file is genuine HiRes, when in fact it isn't.

HDCD is completely different, that uses mathematical trickery to squeeze a little bit more data onto the CD. That really does have improved resolution.

Posted on: 04 March 2017 by nbpf

AussieSteve posted:
I just read an article online from a site called Mojo Audio, .... I am most grateful if you would help me, Regards Steve

Steve, I suggest that you start with

http://www.thewelltemperedcomputer.com/index.html

The page has a well understandable "Getting started" section, discusses basic questions (for instance, "To stream or not to stream") and provides you with a solid understanding of basic notions about media players, ripping, tagging, etc. Another entry point that I found quite useful for understanding basic "resolution" notions is

https://people.xiph.org/~xiphm...demo/neil-young.html

The article is controversial but worth reading, in my view. A third more specific entry point that I can recommend is the MinimServer documentation:

http://minimserver.com/index.html

Even if you will not use MinimServer, the documentation (in particular http://minimserver.com/ug-library.html) provides you with a good understanding of how to make your music collection easy to search, browse and maintain.

Posted on: 04 March 2017 by Huge

Be aware that the xiph.org article, isn't just controversial: There is a fundamental flaw in it.

It assumes that the human ear/brain combination functions in the same way as a FFT spectrum analyser. This is not a valid assumption.
For proof of the existence of a fundamental flaw in it's approach see this article:

https://phys.org/news/2013-02-...ainty-principle.html

Posted on: 04 March 2017 by nbpf

Huge posted:
Be aware that the xiph.org article, isn't just controversial: There is a fundamental flaw in it.
It assumes that the human ear/brain combination functions in the same way as a FFT spectrum analyser. This is not a valid assumption.
For proof of the existence of a fundamental flaw in it's approach see this article:
https://phys.org/news/2013-02-...ainty-principle.html

Right, I should perhaps have mentioned that I have a lot of high resolution music that I very much enjoy!

Anyway, I have started getting interested in music replay only a few years ago and in the beginning I found myself in the same situation as Steve. Meanwhile I have no problems understanding what a device brings to the party and whether it is something I could be interested in or not. But I know a number of fellows who would be genuinely interested in building a data-based music collection and have given up because of the lack of understandable and transparent information. Manufacturers of hifi audio devices have very much contributed to this deprecable state of things, I believe.

Posted on: 04 March 2017 by Huge

OK, reanalysing the "The 24-Bit Delusion" page, there are three flaws in this argument as well.

1 Dynamic Range of the Encoded Signal

The noise floor of a Red Book recording isn't -96dB as implied in the article, it's actually -78dB below the reference level. The red book defines the range of a CD to be from -78dB to 0dB reference level with a further +18dB of headroom above the reference level (that +18dB is there to be used for transients). This effectively puts all their dynamic calculations out by 18dB wrt the average level of the signal in the loud passages of the music; i.e. the level you'd use to set your volume control.

2 Dynamic Range of the Human Hearing

Humans can distinguish and recognise signals buried under noise that can often be as much as 6dB louder than the signal, and in some cases even greater depending on the characteristics of the signal and the noise. Claim that the lower useful limit is 30dB: Busted.

Taking 1 & 2 together

Claims of "So you can’t actually hear the difference between the dynamic range of a 16-bit recording and a 20-bit recording unless you turn the volume up high enough above the background noise that it could cause permanent hearing loss."
and
"Also note that in order to appreciate the dynamic range difference between 16 bits and 20 bits, you would need to be in an ultralow-noise environment, such as a recording studio, with treated walls, isolated AC power, and 100% balanced electronics"
: Both Busted.

3 Dynamic Range of DACs

The article states "Any company that claims greater than 20-bit resolution from their DAC is simply full of shit.". This is absurd as TI are not full of shit and some of their 24 bit DACs have a measured dynamic range of 21bits. Claim: Busted.

Too much trying to sell their own DAC and making selective use of numbers to justify it.

Posted on: 04 March 2017 by AussieSteve

Thanks Huge, that's the one thing that I actually thought I had picked up on.

Posted on: 05 March 2017 by Simon-in-Suffolk

Huge, your second point may well be correct, I don't have the specifics, but I am not sure about your first. 16 bit samples have an undithered dynamic range of 96dB ... reference to 0.

How you encode that and shape that and how ADC and DACs use those 16 bits is separate, but at the end of the day I seem to remember IEC 908 states the PCM samples for CD-Audio are stereo signed linear 16 bit samples, which is theoretical 96dB maximum encoding dynamic range per channel. Yes I know dither, stereo, headroom, and filter reconstruction can all affect the dynamic of the actual reconstructed signal slightly, but I didn't see much discussion on this on google search I did from the referenced site, and it clearly it is completely upto you how you master your media.

But I do think this is all genuinely of little consequence, I suspect most people in their domestic environments don't have the equipment to resolve and deliver much above 16 to 20 bits worth of dynamic range, even if they were to listen really loudly on peaks.. and even then unless it an uncompressed orchestral or similar recording, even this sort of dynamic range is of little consequence..

The other thing that is not mentioned in the article I saw is that when we listen to loud sounds and transients our hearing compresses and distorts ... so Mother Nature will compress the sound for us depending on the frequency and pressure of the sound.. again the AES had a fascinating paper on this.

Posted on: 05 March 2017 by Simon-in-Suffolk

as a follow up it appears even 16 bits of dynamic range exceeds by quite a margin of what might be achievable in play back and be detectable by the listener ... an interesting presentation from an AES member that explores the OP's initial query quite well I think.

https://youtu.be/SQzNPAdF4aI

Posted on: 05 March 2017 by Huge

As he points out his test is only related to the users perception at their normal listening levels, it is not relevant for determining anything to do with the technical measurement of the reproduction equipment. Within the parameters set, his test is valid; it's not valid to extrapolate his results outside those parameters.

The reasons you won't get any technical factors for equipment from these results, are as follows...

1 The 'individual's reference level' would need to be the maximum level the individual will ever use.

2 After setting the 'individual's reference level' there would need to be be a period of very low level pink noise to allow for auditory accommodation to occur.

3 If a signal equal to the 'individual's reference level' is to be included in the teat (to intentionally disrupt the auditory accommodation) it should be very short (and not repeated at the end)

4 The 0dB point shouldn't be at the average level of 'individual's reference level', it needs to take into account the transients and the crest factor for that signal - i.e. it's at the average RMS of the 'individual's reference level' + the transient margin for that signal + the peak crest factor for that signal. Even then that applies only to that one signal. A playback system needs headroom for the largest transients and crest factors that can occur and digital clipping is a brick wall effect.

5 On what level of sound recognition should be directed to report success in the quiet phases? If ANY sound AT ALL is heard? If the sound is herd all the time? If only transients are heard?

6 Reproduction equipment needs to be designed for people with exceptional (rather than average) hearing.

It's a very interesting approach, it gives some useful information for the individual. just doesn't give results specific for design of reproduction equipment or anything that can be directly related to that, for the reasons given.

P.S. His voice needs more frequency modulation and more dynamic range!

Posted on: 05 March 2017 by Huge

Simon-in-Suffolk posted:
Huge, your second point may well be correct, I don't have the specifics, but I am not sure about your first. 16 bit samples have an undithered dynamic range of 96dB ... reference to 0.

...

OK, say you use a reference level of 0dB (2V), and a noise floor of -96dB, even a 0dB sine wave has a crest factor of 1.414, so to reproduce a 0dB sine wave requires 2.83V, i.e. even a 0dB sine wave will be clipped if you try to use the full 96dB dynamic range.

1 Music signals have higher crest factors (that can exceed 3).

2 You've no headroom at all to allow for transients.

The 96dB is a mathematical factor, not the real world usable musical dynamic range. That's why the Red Book specified a reference level of -18dB so that recordings would all have similar perceived loudness.

(I'm still trying to find the reference to this that I found many months ago.)

Posted on: 05 March 2017 by Huge

Simon-in-Suffolk posted:

...
The other thing that is not mentioned in the article I saw is that when we listen to loud sounds and transients our hearing compresses and distorts ... so Mother Nature will compress the sound for us depending on the frequency and pressure of the sound.. again the AES had a fascinating paper on this.

That's an argument for increasing the dynamic range above 96dB so that transients can fully exploit the ability of the ear to accommodate sounds of short duration (hence not damaging) running into this area of extended dynamic range (c.f. companding techniques such as Dolby, but operating in reverse... i.e. expansion to take account of subsequent compression).

(The high intensity sound compression is protection mechanism for the ear that exploits the variable non-linear elasticity of the muscle and connective tissues from which the ear is constructed - it's a very clever design.)

Posted on: 05 March 2017 by jon h

Huge posted:
OK, reanalysing the "The 24-Bit Delusion" page, there are three flaws in this argument as well.

1 Dynamic Range of the Encoded Signal
The noise floor of a Red Book recording isn't -96dB as implied in the article, it's actually -78dB below the reference level. The red book defines the range of a CD to be from -78dB to 0dB reference level with a further +18dB of headroom above the reference level (that +18dB is there to be used for transients). This effectively puts all their dynamic calculations out by 18dB wrt the average level of the signal in the loud passages of the music; i.e. the level you'd use to set your volume control.

Errr, except the +18dB thing was from the analogue era. No-one has worked that way since then. Meter on desks dont work that way -- they dont read up to +18dB, they read to 0dB ie clipping point

Posted on: 05 March 2017 by jon h

Huge posted:
, say you use a reference level of 0dB (2V), and a noise floor of -96dB, even a 0dB sine wave has a crest factor of 1.414, so to reproduce a 0dB sine wave requires 2.83V, i.e. even a 0dB sine wave will be clipped if you try to use the full 96dB dynamic range.

Except peak bits is peak bits is peak bits. Its whatever voltage you set it to. If I have a peak bits 1KHz sine wave, and my output for peak bits is 2V, I get a 2V sine wave. I dont know where you are getting this 2.83V from.

Posted on: 05 March 2017 by Huge

Yes, so if you set the RMS level at 0dB in the digital domain, you guarantee clipping (you then need 2.83V from a maximum output swing of 2V to even reproduce a sine wave).

The -18dB reference level was specified for two purposes
1 to ensure that disks had approximately the same perceived loudness
2 the same reason that the 0VU was set at -18dB with respect to instantaneous peak level in the analogue era - to accommodate transients and crest factor. The music hasn't changed you still need to allow for the same patterns in the signal.

Posted on: 05 March 2017 by Huge

jon honeyball posted:
Huge posted:
, say you use a reference level of 0dB (2V), and a noise floor of -96dB, even a 0dB sine wave has a crest factor of 1.414, so to reproduce a 0dB sine wave requires 2.83V, i.e. even a 0dB sine wave will be clipped if you try to use the full 96dB dynamic range.

Except peak bits is peak bits is peak bits. Its whatever voltage you set it to. If I have a peak bits 1KHz sine wave, and my output for peak bits is 2V, I get a 2V sine wave. I dont know where you are getting this 2.83V from.

Because a 2V RMS sine wave is 2.83V peak.

If you're using 96dB for the dynamic range than that has to be the minimum RMS signal If you use peak here, then, undithered, you'll only get a random increase in the noise level, not a consistent signal.

Posted on: 05 March 2017 by Sloop John B

You following all this Steve?

I don't know why I'm drawn to threads like this as I really understand diddely squat about the arguments and measurements, strangely I do find them compulsive reading though.

.sjb

Posted on: 05 March 2017 by Simon-in-Suffolk

Huge - exactly - my point was the measurements of bit depth and dynamic range only loosely maps to the real world dynamic range of the reconstructed signal (signal voltage) and even more loosely to the range from conventional audio replay equipment... and our real world hearing... and in practice 16 bits as the final replay dynamic range is probably more than ample in the cast majority of real world cases as opposed to a controlled lab environment.

BTW because of headroom considerations recorded material will usually allow a head room and therefore reduced average dynamic range.. however some (many?) DACs are designed to accommodate reduced programme headroom and still not clip - as I say this is down to implementation and nothing to the encoded signal per se. Also 16 bit PCM has noise shaped dither added that reduces dynamic range with the advantage of reduced quantisation distortion.. but it all gets complicated as it usually shaped the impact on the dynamic range varies against frequency - just like our hearing threshold varies against frequency