Help ripping ~30,000 CDs, Was “Help digitizing […]” ;)
post Mar 29 2012, 19:08
Post #1

Group: Members
Posts: 9
Joined: 29-March 12
Member No.: 98189

Hey! I'm a newbie around these forums, but hopefully I'll be around quite a bit. I really like the community here, and hope I can contribute in the future. But enough introduction, here's the interesting stuff:

I work for a college radio station, and we've decided to undertake the rather ambitious project of digitizing the CDs we've acquired over the years. This is a pretty monumental undertaking, so I'm looking to make this as painless and quick as possible. We have a very rough approximation of about 30,000 CDs that we're looking to convert to digital files, and it's my job to work out many of the more technical aspects of the project.

The problem with being a college radio station is that we're on a pretty limited budget. We can't afford any sort of robot or anything like that to help the process along, nor can we afford any sort of service, so we're stuck doing it ourselves. Thankfully, we have a bunch of people willing to put the time and effort in. We also aren't terribly picky about getting every rip totally 100% perfect. But I've done a fair bit of research, and here's the kind of plan I had in mind:

Ideally, we have one pretty decent quad-core desktop that we're planning to outfit with four CD drives. We have software that allows us to rip multiple discs at once to V0 MP3s which are stored on a small RAID 1 array inside the computer. I've done some informal ripping tests, and have narrowed down the two pieces of software that seem to work best to fre:ac and dBpoweramp. I have also tried EAC and simply ripping with MediaMonkey, but freac and dBpoweramp seemed the most efficient and easy to use. Now, if I decide to use one of these pieces of software (if anyone has any suggestions, I'm 100% open to them!) how can I configure them to make them as painless as possible? Would using multiple drives be an option? I found very little information about software that provided ripping from multiple drives simultaneously, so I'm assuming this is not a common feature. If not, would using different computers be our best bet? If anyone has any other suggestions about ripping multiple discs at the same time or other ways to improve efficiency then that would probably make my life much easier.

thanks for your time!
post Mar 30 2012, 17:44
Post #2

Group: Members
Posts: 9
Joined: 29-March 12
Member No.: 98189

First off, I want to say thanks so much to everyone replying to this thread so far. It's been incredibly helpful.

QUOTE (dumdidum @ Mar 30 2012, 03:56) *
i second the suggestion of ripping to a lossless format. first, storage is cheap. second, generation loss could be an issue. after all, many if not most radio stations broadcast their show in a lossy format (internet radio, digital audio broadcasting, etc.).

This is a good point, we do broadcast online. I think that this is such a huge undertaking and going lossless will make it even huger, but I'm being convinced more and more that going lossless is worth the effort. Then it's just the issue of "how do we back up and make 10TB of data network accessible on the budget of a college radio station?"

QUOTE (Porcus @ Mar 30 2012, 05:07) *
I ripped about 7000 CDs to FLAC using dBpoweramp, a Sony XL1B2 200-disc mediachanger (well actually two, luckily since one wore out ... 2nd hand ones available for cheap at Amazon: http://www.amazon.com/gp/offer-listing/B00...;condition=used ).

You can probably use dBpoweramp's Batch Ripper. What I did -- this was at a time Batch Ripper was fresh and a bit immature -- was to hack together an AutoIT3 script that automated dBpoweramp. I did once post it at http://www.avsforum.com/avs-vb/showthread....86#post13939586 , but don't hold it against me, it is fairly lame coding. (And forget whatever I wrote there about HDCD. I regret using the HDCD DSP.)

I know that people have modified REACT to work with the mediachanger too.

I appreciate the links, and dBoink is a pretty awesome name. If at one point we do decide to get a dedicated ripper, the XL1B will be at the top of the list, thank you! Does Sony have any current version of this that they're selling? And how did you go about storing 7000 CDs of FLAC files?

QUOTE (LosMintos @ Mar 30 2012, 09:28) *
Sorry, I didn't read all posts carefully, nevertheless, let me add/emphasize some points.
  • You should insist on accurate, secure and lossless rips! You'll only do it once, do it right!
  • I've no experience with real batch rippers, but ripped a lot of CDs on ordinary PCs equipped with two CD-Rom drives. You will just open two instances of dbpoweramp or EAC and it's just fine. However, this is limited in terms of keeping a clear view on open CD cases on the desk and open program instances on the screen. With four drives you'll not gain much increase in over all speed, IMHO. The computer will wait for you (rather than you waiting for the computer).

This is something I didn't consider. I did some more tests last night, and it looks like three drives is the sweet spot. I also think I'll stick with dBpoweramp's batch ripper though.

QUOTE (LosMintos @ Mar 30 2012, 09:28) *
  • Metadata is a crucial issue. Databases are not correct anyway and every n-th CD will not be found. Then you'll have to enter the data by yourself, the most time consuming step.
  • If you can't effort a batch ripper, you probably can distribute the job to many volunteers. In advance you have to agree on a standard:
    • codec
    • folder hierarchy
    • tagging scheme incl. cover images
    • what to do with unknown or erroneous CDs
    When ripping manually I'll pre-sort a bunch of CDs: regular albums, sampler, soundtracks etc. This reflects my folder hierarchy and speeds up ripping in my case. You could consider things like this, when distributing to volunteers.

    I think I'll deal with bad or missing metadata by marking CDs that dB couldn't rip automatically and revisiting later to manually type the data in, but the folders for different types of music is an excellent idea. We get lots of promo material and compilation albums so it's probably a good idea to separate the music into broad categories.

    Here's the example folder hierarchy I was thinking of:
    \Library\Category\Artist\Album (ID number we add when we get it)\Track number. Artist - Title

    or for a real-life example:
    Library\Full Albums\Modeselektor\Monkeytown (30458)\04. Modeselektor - Evil Twin.flac

    QUOTE (LosMintos @ Mar 30 2012, 09:28) *
    I see, you're not likely to give CDs away (I fully understand!). That way, a large room with many computers equipped with max. 2 drives each will help more than few computers with a lot of drives each. Just my humble opinion ;-). Such a set up will likely be outperformed by a real batch ripper. In addition a network storage might be interesting for you. And a scanner for missing cover art.
    Just my thoughts, hopefully of some help for you :-)

    That's probably a great point. We have some older computers that may allow us to rip while keeping everything standardized, that's something I'll look in to. Coming up with a way to store the files efficiently and as cheaply as possible is another concern, too. But thankfully, cover art isn't terribly important, so we probably won't spend too much time on that.

    It would be nice to digitize at a rate of like 100CDs/hour, but that's likely unattainable without multiple people working at once.

    QUOTE (Porcus @ Mar 30 2012, 11:38) *
    QUOTE (LosMintos @ Mar 30 2012, 15:28) *
    You will just open two instances of dbpoweramp or EAC and it's just fine.

    Be careful. My experience with dBpoweramp is that it might from time to time switch to the most-recently-used drive. Probably not without telling me, but I have overlooked it (and gotten a few rips with absolutely wrong content). I don't think it is intended to have concurrent versions open.

    QUOTE (LosMintos @ Mar 30 2012, 15:28) *
    pre-sort a bunch of CDs: regular albums, sampler, soundtracks etc.

    - remasters, if you want to have them distinguished. The metadata sources do not.
    - promos. Some of them have beeb sounds and talking interfering with the music.
    - I keep classical music away from the rest -- or rather: music sorted by composer, apart from music sorted by performer.

    It would be nice to be able to be incredibly specific about these things (classical music and promo material) I'm only here three more years! laugh.gif I think that we might overlook some of the more specific subfolder ordering in the interest of time. We'll be putting the entire library into MediaMonkey, too, so it'll be organized in that way.

    QUOTE (pdq @ Mar 30 2012, 11:40) *
    On the cost of storing lossless files - consider the tens if not hundreds of thousands of dollars that those 30,000 CDs cost originally. Storage space for FLAC runs about 5 cents per CD.

    That certainly puts it into perspective, yeah. Hard drives are pretty cheap in the long run.

    It just seems like the biggest hurdle now is to store and back up all these terabytes of data we're going to create by going lossless and still make them accessible to the other computers on our local network.
