AI & Machine Learning
MindTech
Diffusion-Based Audio Generation
Overview
At MindTech, I worked on adapting diffusion models to audio synthesis using spectrogram representations. I contributed to training diffusion models from scratch on licensed music data and studied how preprocessing, conditioning strategies, and architectural choices affect musical coherence.
A key part of my work involved identifying the limits of treating audio as images and exploring ways to introduce more explicit musical structure into the generation process.
Key Contributions
- Trained diffusion models on spectrogram representations of audio
- Developed preprocessing tools for normalization and prompt filtering
- Analyzed time-frequency resolution, phase reconstruction, and controllability
- Investigated hybrid symbolic and neural approaches to sound generation
Technologies
Python
Stable Diffusion
PyTorch
Spectrograms
Audio Processing