AI & Machine Learning

MindTech

Diffusion-Based Audio Generation

AI Engineer 2025 Wilmington, DE (Remote)

Overview

At MindTech, I worked on adapting diffusion models to audio synthesis using spectrogram representations. I contributed to training diffusion models from scratch on licensed music data and studied how preprocessing, conditioning strategies, and architectural choices affect musical coherence.

A key part of my work involved identifying the limits of treating audio as images and exploring ways to introduce more explicit musical structure into the generation process.

Key Contributions

Trained diffusion models on spectrogram representations of audio
Developed preprocessing tools for normalization and prompt filtering
Analyzed time-frequency resolution, phase reconstruction, and controllability
Investigated hybrid symbolic and neural approaches to sound generation

Technologies

Python Stable Diffusion PyTorch Spectrograms Audio Processing