Managing a huge digital collection

Posted by: antmast on 09 November 2014

Is anyone having difficulty pulling up particular items in your collection because of the inconsistent manner different sites tag high-def files, especially compared to the comprehensive tagging information gotten from a ripped cd. How do you resolve this problem or do you just live with it? In the digital world filtering by composers, conductors, year of release, etc, should be at ur fingertips. But this is far from reality right now. I am struggling with 1500 right now(Oh look at that one!).

Posted on: 26 November 2014 by Simon-in-Suffolk

Huge, actually I am talking from an engineering design and system architecture perspective, which is where I have typically used these media files in voice and contact centre systems. Additionally I have typically used differing MIME types as per RFC 2361, and that is why I am familiar with the format and their metadata abilities. I also use them personally for Hifi of course, as indeed many do on this forum, but that is probably neither here nor there, and only uses a tiny portion of format's  capabilities and is always linear  PCM.  I guess most users here are oblivious and simply don't care about the extensive capabilities of WAV.

 

But yes I concede I am not and have no interest of becoming an MS XAudio2 application programmer - but here is a good link on MSDN by a MS programmer about demystifying WAV files - and also contains links to the full RIFF specification including the valid defined subsets for WAV

 

http://blogs.msdn.com/b/dawate...-the-wav-format.aspx

 

I have summarised and copied the Wav RIFF format as defined by IBM and MS here for convenience

 

<WAVE-form> ->
   RIFF
   (
       'WAVE'
       <fmt-ck>          // Format
      [<fact-ck>]          // Fact chunk
      [<cue-ck>]          // Cue points
      [<playlist-ck>]        // Playlist
      [<assoc-data-list>]    // Associated data list
      <wave-data>          // Wave data
   )

 

Note ID3 chunks are not specified, and therefore is why it is a non standard extension but has been adopted by many - and is valid as long as it complies with the RIFF construction rules.

 

I respect you possibly are coming from a position of the defined minimum subset of parameters for the MS XAudio2 application libraries - but my point there is a world beyond that that is still valid to the MS/IBM RIFF specification that RFCs and industry standards have been built on - but you still might feel these standards are not valid and I say fair enough.

 

Info list [data-list] is really interesting - as many of the constructs are of the XMF constructs - which has also become the norm for digital image meta data - but I digress.

 

Simon

Posted on: 26 November 2014 by Huge

Simon,

 

Even that blog (not a definition document) admits:

"The WAV format is arguably the most basic of sound formats out there. It was developed by Microsoft and IBM, and it is rather loosely defined. As a result, there are a lot of WAV files out there that theoretically should not work, but somehow do."

 

 

And the 'List of Available Chunks' linked from that article even admits that some on the list are often considered an abuse of the WAV file format, even by it's author.

It also states

"The down side to the Wave file format's popularity is that out of the hundreds of programs that support it, many abuse or misuse it due to bad programming and/or poor documentation. Once some of these "naughty" programs get fairly popular and churn out millions of incorrect Wave files, the rest of the software industry is forced to deal with it and write code that can read the incorrect files."

 

 

As I said the extensions and other abuses of the format may well be in common use, but they are not part of the specification.

 

I don't disagree that they'll often work, just that this cannot be guaranteed, for anything beyond the original defined (and much more limited) specification.  And, as stated in the page you linked, they "theoretically should not work".

 

 

The additional standards are valid, and I have no problem with them at all  - they are just not WAVE files.  Files meeting those additional specifications should be described as such, and just not be described as WAV.

Posted on: 27 November 2014 by mrspoon

Any standards compliant wave reader would skip over any chunks it does not recognize, this accounts for 99.99% of the wave readers out there.

 

The issues arise with badly programmed readers, which check for a data chunk, then assume everything in the file after the data-chunk is audio, without checking the data chunk length. This can cause any chunks after the data chunk (such as 'id3 ' or even a standards compliant info chunk to be played as static noise). These players are broken, I have known certain Sony and a handful of in-car stereos to be broken like this. The whole idea of a chunk based container is that unknown chunks are skipped, any reader which does not do this cannot read compliant, or 'non-compliant' files.

Posted on: 27 November 2014 by Huge

Hi Mr Spoon,

 

Despite great respect for your work at Illustrate...

 

I would agree for generic RIFF readers.

I would agree for readers reading a subset of RIFF including WAVE

I would agree for a pragmatic implementation of a WAVE reader

 

I also very much agree about checking data chunk lengths, not doing that is extremely bad practice and a code vulnerability.

 

This functional extension into wider RIFF compliance is needed due to the use of other RIFF facilities that are not strictly included in the WAVE standard subset of RIFF as published by Microsoft.  Hence, in practice, readers now have to be extended into generic RIFF compliance, rather than just being compliant with the WAVE subset.  I still believe files using these facilities should be described as RIFF files (or something else), not as WAVE files.

 

Some of the illustrations you give may simply be strict WAVE readers.  To find out I'd need to inspect their code or code specification (and I can't do that).  Feed them a WAVE file strictly compliant with the  Microsoft standard and they'll still work - this is exactly why the 'extended' use .wav files should be described differently.

 

 

However we have gone off at something of a tangent to of topic, while it's not entirely irrelevant, but not core to the subject.  Maybe moving to the padded cell would be better?

Posted on: 29 November 2014 by Bert

Back to topic:

 

JRiver Media Centre is excellent software for ripping, tagging, displaying and playing all music formats. It has some clever wizards to search and replace text etc. It supports non-English characters like French & German accents, Scandinavian and Russian characters.

 

Indeed, you should agree with yourself a consistent tagging convention, especially if you have both pop and classical music.

 

I use for pop: 

Artist = Artist starting with the first name (Aretha Franklin under A), all bands without The (Beatles, Doors)

Album = Year of release - Album name. If you first type 'Year of release' you get them sorted per year of release. And it gives a nice historic perspective: this tagging convention lets you sort all albums you have per year of release. Scrolling over the years brings back good memories!

 

For classical:

Artist = Composer, always sorted as Last name, first name, year of birth-dying, so "Bach, Johann Sebastian 1685-1750". Handy if you many composers and want to know the period that the guy lived.

Album = Name of piece, year of writing - year of recording Conductor/solist & Orchestra. For instance:

"Symphony No.3, 1904 - 2003 Chailly & Concertgebouworkest" That's useful if you have many versions of say Mahler's 2nd Symphony, as they are now nicely sorted per recording year, and you hear the recording technology develop over time.

An exception is Bach music, for which I start the album name with the BVW number, so BVW0080, BWV0244, BWV1068 etc. The BWV number is handy as it is unique and it sorts the genres in Bach music.

 

Like others mention, when there are more classical composers on 1 CD I split them in separate Album tags. This gives you a better view of what you have collected from a composer. Notorious are the compilation albums from French composers like Faure, Debussy, Ravel.

 

On compilation albums I often change the sequence of the songs to get them in chronological order. Sometimes I add the year of the hitsingle, to get the historic perspective. I compile double CD's into one, updating the track numbers of the second CD.

 

I spend a lot of time on this, but it is very rewarding to have a well-structered music library. Enjoy!