Rhapsody Refiner: A Deep Learning Symbolic Music Variator

Overview

This project investigates the design and evaluation of a deep learning-based music variation system that supports creative ownership and active co-creation rather than replacing the musician. The central focus is preserving personal ownership and artistic identity in AI-assisted music composition. Rather than generating complete songs from prompts, this work explores how AI can extend and vary a musician's own ideas through fine-grained control over musical attributes using masked prediction with MusicBERT. The primary target users are practising musicians, including songwriters, producers, jazz musicians, composers, and instrumentalists. The core research questions driving this work are:

How can AI systems support musicians while preserving creative ownership?
How does a variation-based AI system function in ecological (real-world) composition settings?
What tensions arise between technological capability and artistic identity?
How can masked prediction enable controllable music variation with strict control over musical attributes?

Approach

Rhapsody Refiner uses MusicBERT, a bidirectional transformer for symbolic music understanding, combined with masked token prediction to enable controllable music variation.

System Architecture

The core architecture consists of four key technical features. First, Octuple Encoding represents each MIDI note with 8 attributes (bar, instrument, pitch, position, velocity, duration, time signature, key), enabling attribute-level control without entanglement. Second, Masked Token Prediction allows the system to mask selected note attributes and predict them autoregressively using MusicBERT, with masking governed by a variation parameter determining how many notes are modified. Third, Strict Attribute Control enables users to selectively vary pitch, beat placement, beat span (duration), and dynamics, with only explicitly selected attributes being masked and predicted. Finally, Logit Filtering ensures correct token types, optional pitch range constraints, and optional key-fixing with soft scaling. The system also includes a New Notes Function allowing users to add new notes proportionally across bars before prediction.

Variation Generation Process

The variation process begins with users uploading MIDI files and selecting variation amount (0-100%), attributes to vary, bar ranges, pitch range or key constraints, and temperature per attribute. The system then uniformly samples notes to mask, masks only selected attributes, iteratively predicts masked tokens via MusicBERT, applies probability filtering and temperature scaling, and outputs a modified MIDI file. Crucially, the system never generates a full song from scratch; it requires an initial musical phrase from the user. For technical details of the implementation see related reading at the end of the page.

Four-Week Ecological Evaluation

Eight practising musicians with diverse backgrounds (songwriters, jazz musicians, producers) participated in a four-week ecological evaluation. Participants received software and a tutorial, were asked to compose a song using Rhapsody Refiner, and were encouraged to use the system consistently. They kept reflective journals while system logs recorded usage, and post-study semi-structured interviews were conducted. Data was analysed via inductive thematic analysis, prioritising ecological validity over lab control.

Key Outcomes

The evaluation revealed important insights about AI systems for creative practice. AI systems for creative practice should require effort, preserve authorship, and function as ideation partners rather than autonomous creators. The following key findings emerged:

A Tool for "Moments," Not Complete Ideas: Participants rarely used entire generated variations, instead extracting small parts or moments that sparked inspiration. Outputs were often imperfect or chaotic, with randomness being valuable for ideation. The system functioned as a spark generator, not a finished-composition engine.
Strong Sense of Creative Ownership: Participants consistently reported full control over the direction of composition and ownership of the creative process. Because the system depends on the musician's input and refinement, ownership remains human-centred.
Active Co-Creation Encouraged by Imperfection: Because outputs were incomplete or messy, musicians had to filter, refine, and shape ideas. This effort reinforced authorship and agency, suggesting that systems requiring skill may better support practising musicians.
Identity and Humanity in Music: Participants expressed discomfort with fully generative prompt-based systems that threatened their sense of worth. Rhapsody Refiner avoided this by not generating initial ideas, not producing finished songs independently, and relying on musician skill.
Design Principles for AI Music Tools: Key principles include preserving ownership through requiring user input and effort, supporting rather than replacing by generating variations instead of complete artefacts, enabling attribute-level control through strict masking and logit filtering, embracing imperfection where randomness and partial failure promote exploration, and conducting ecological evaluations to reveal identity tensions and workflow realities.

Publications & Related Reading

NeurIPS Creative AI 2025 Paper Github Repository