Atlantic reporter Alex Reisner has identified four music datasets that have been used to train AI systems, and he has made them searchable for the public. Two of the collections are massive, containing roughly 12 million and 9 million tracks. The remaining two are far smaller by comparison, but still substantial, with each containing more than 100,000 songs.

Reisner reports that the datasets have been downloaded thousands of times. Although it is not possible to determine every company or researcher that has used them, both Google and Stability have acknowledged using some of this material in research papers. Certain sources, including the Free Music Archive dataset, can be streamed free for personal listening, but commercial use typically requires proper licensing.

On the surface, these datasets may appear to be freely accessible online, but turning them into AI training material involves more than simply downloading a file and running it through a model. Reisner notes that the process often depends on how the music is gathered in the first place:

Three of the datasets he examined are not distributed as audio files, but as lists of links to songs hosted on platforms such as YouTube or Spotify. Developers then use automated tools to retrieve the audio, and some of those tools can get around logins, ads, and systems designed to generate revenue or subscriptions for creators. Reisner points out that these methods violate the platforms’ terms of service.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

CMF Phone Launch This Year Unaffected by Rising RAM Prices

Nothing’s next affordable smartphone has become the latest casualty of the ongoing…

SwitchBot’s Standing Circulator Fan Makes a Strong Case for Smart Cooling

It’s been a long time since a fan genuinely caught my attention.…