AI music revolution sets stage for ‘dataset ethics’


Share post:

The rapid advancement of artificial intelligence is reshaping the music industry in ways we never thought possible. From cloning an artist’s voice through simple web interfaces to generating entirely new compositions in seconds based on text prompts alone, AI is pushing the boundaries of creativity and challenging our understanding of authorship and ownership—and artists are speaking out about the technology infringing on their rights. As we stand on the precipice of this revolutionary shift, it’s crucial that we consider the ethical implications of these powerful tools.

While it’s easy to get caught up in the excitement of AI-generated music circulating online, the real work of creating ethical AI happens behind the scenes, deep within the AI supply chain. At the heart of this process lies the creation of massive datasets, meticulously labeled and annotated, which serve as the foundation for training AI models. The recordings, compositions, and metadata that comprise these datasets hold the key to unlocking AI’s potential while ensuring fairness and respect for the creators and copyright owners who breathe life into the music we cherish.

As we navigate this uncharted territory, it’s essential that we approach the creation of these datasets with the utmost care and consideration. We must ask ourselves tough questions about the provenance of the data we use, the rights of the artists involved, and the potential impact on the music ecosystem. Only by grappling with these complex issues head-on can we hope to build an AI-powered future that upholds the values of creativity, diversity, and equity.

Getting the goods: quality matters

Creating a robust and reliable music AI requires a vast amount of high-quality data—we’re talking hundreds of thousands to millions of tracks which comprise tens of thousands of hours, including a diverse range of solo instruments and MIDI files. The temptation to take shortcuts by scraping audio from various online sources is understandable, but this approach risks infringing upon the rights of artists and copyright holders and decimating the value of music copyright. Even “open datasets” claiming to consist entirely of public domain or Creative Commons material often contain copyrighted works, creating a murky landscape where the origins and permissions of the data are unclear.

To build a truly ethical AI, we must prioritize proper licensing and collaboration between AI developers and copyright owners. By working hand in hand with rights holders and artists, we can create training datasets that respect intellectual property rights and ensure that creators are fairly compensated for their contributions. This approach requires a significant investment of time and resources, but it is the only way to guarantee the integrity and sustainability of the AI music ecosystem.

Imagine a future where AI companies and the music industry forge partnerships built on trust, transparency, and mutual respect, where AI music platforms function as digital service providers (DSPs) the way Spotify and its ilk do today. By working together to create high-quality, ethically sourced datasets, we can unlock the full potential of AI while safeguarding the rights and livelihoods of the artists who make it all possible. It’s a challenge, but it’s one we must embrace if we hope to build a future where creativity and technology can thrive together.

Metadata matters: annotations and transcriptions

Having secured a vast collection of ethically sourced recordings, the real work begins. Each track must undergo a rigorous process of annotation and transcription, carried out by a team of highly skilled music experts. This involves documenting every aspect of the composition, from tempo and key to instrumentation, moods, and chord progressions. Leading companies in the AI music space are dedicating significant resources to providing unparalleled levels of detail for millions of recordings and compositions.

This metadata serves as the lifeblood of AI models, enabling them to identify patterns, learn from the intricacies of human creativity, and generate novel works that push the boundaries of what’s possible. The more comprehensive and accurate the metadata, the more sophisticated and nuanced the AI’s output will be. However, the importance of this process extends far beyond the pursuit of creating cool music—it’s about upholding our responsibility to the rights holders who make it all possible.

By investing in meticulous metadata creation, companies not only enhance the quality of their AI models but also demonstrate their commitment to respecting the intellectual property rights of artists and creators. This metadata provides a clear and transparent record of the origins and ownership of each piece of music and ensures the musical accuracy of the data fed into the model.

By prioritizing the creation of detailed, accurate, and ethically sourced metadata, this lays the foundation for a more equitable and sustainable AI music ecosystem.

Bringing it to market: licensing and indemnification

With an ethically-sourced and meticulously annotated dataset in hand, AI music developers are well-positioned to create groundbreaking products. However, before launching their AI-generated music offerings, they must ensure that they have the necessary commercial licenses in place.

Currently, many AI developers take a shortcut by relying on fair use or public domain claims, assuming their use of copyrighted material falls under these legal exceptions. However, this approach is often misguided and can lead to legal disputes down the line. Fair use is a complex and case-specific doctrine, and claiming its protection without a thorough legal analysis is a risky proposition.

To avoid these pitfalls, AI developers should prioritize obtaining proper commercial licenses for the music they use in their training datasets. This process involves reaching out to rights holders, negotiating terms, and ensuring that all parties are fairly compensated for their contributions. While this may seem like a daunting task, it is essential for building trust and fostering long-term collaborations with the music industry, not to mention enabling continued access to high-quality training data.

Forward-thinking AI companies are taking a proactive approach to licensing by engaging with music rights holders early in the development process. By establishing open lines of communication and working together to create mutually beneficial licensing agreements, these companies are setting the stage for a more sustainable and equitable AI music ecosystem.

In addition to securing the necessary licenses, AI developers should also consider indemnification clauses and Errors and Omissions insurance requirements in their agreements with rights holders. These clauses provide protection against potential legal claims arising from the use of licensed material, offering peace of mind to both the AI company and the music industry partners.

As the AI music landscape continues to evolve, it is crucial that developers prioritize ethical licensing practices and collaborate closely with the music industry. By doing so, they not only mitigate legal risks but also contribute to a future where AI and human creativity can coexist and thrive, unlocking new opportunities for innovation and artistic expression.

The future of AI music: setting ethical standards

AI music is here to stay, and the industry faces critical decisions that will shape its trajectory. While it may not be feasible to retroactively re-license every track in existing datasets, we have the power to set ethical standards and cement a licensing framework that benefits all stakeholders moving forward.

It is crucial for companies in the AI music space to take the lead in driving this solution. By prioritizing “dataset ethics” from the ground up, AI music model developers can play a pivotal role in building an ecosystem that respects creators, rewards innovation, and upholds the integrity of the art form we all cherish.

This commitment to ethical practices involves a multifaceted approach. First and foremost, it requires a dedication to sourcing training data through proper licensing channels, ensuring that rights holders are fairly compensated for their contributions. Additionally, it necessitates the creation of robust metadata frameworks that provide transparency and attribution for the music used in AI datasets.

Beyond these technical considerations, setting ethical standards for AI music also demands active collaboration and open dialogue between AI companies and the music industry. By working together to develop equitable licensing models and establish best practices, we can foster a spirit of trust and mutual respect that will serve as the foundation for a thriving AI music ecosystem.

The future of music is unfolding before our eyes, and the decisions we make today will reverberate for decades to come. As an industry, we have the opportunity—and the responsibility—to ensure that this future is built on a bedrock of ethics, fairness, and respect.

Alex Bestall is the founder and CEO of Rightsify and Global Copyright Exchange (GCX), two companies at the forefront of the AI music revolution.

More must-read commentary published by Fortune:

The opinions expressed in commentary pieces are solely the views of their authors and do not necessarily reflect the opinions and beliefs of Fortune.


Please enter your comment!
Please enter your name here

Related articles

Book Review: Poor Charlie’s Almanack

Poor Charlie’s Almanack: The Essential...

Gene-edited pigs that provide organs for humans enjoy luxury accommodations

Wide-eyed piglets rushing to check out the visitors to their unusual barn just might represent the future of organ...

Physicians Need Better Data Management Systems to Improve Patient Care

The health care industry produces an astonishing amount of data:...