LM / LLM is one of my recent hobbies, and I found it very fascinating that these models can achieve such advanced intelligence, understanding natural languages inputs and generating outputs (texts, images, music, etc.) that I otherwise wouldn’t be able to create. I often feel like a student when I using these LM models; while I am not denying AI could have its own risk if its development is not properly monitored / it is not properly used (which is a whole different debate), it definitely appear a productivity multiplier to me, allowing me to work on a lot of cool things without needing a teacher by my side.
Anyways, I have used musicgen-medium (a text-to-music model developed by Facebook) to compose (or generate?) a piece of music. While the music is far from anything like a pop song, I still find the output interesting and like to share it on this platform.
A few things I learnt during this small project
- LLM (large language models) can be applied to many different medium outside of just texts. Feed it with a bunch of texts, then LLM can learn to generate text-based outputs; feed it with tons of music, then LLMs can learn to generate an instrumental piece. It appears to me that LLMs are almost like a fast learner who can apply the same (or a similar) technique in learning many different things.
- This technqiue can be scaled very easily: the music-gen model has a duration limit of 30 second on its output. However, what one can do is to produce a series output, strain them together, thereby producing a coherent piece of music. Of course, forming a coherent piece of music at the end requires good prompts as well as other audio editing techniques. From my point of view though, this is a very scalable technology that can be applied to not only business and administrative tasks, but also artistic creations.
- Still the output — for now — is far from being advanced: as I mentioned, the output is far from anything like a pop song, even though that was my original goals (ie. generate a pop song with LLM). Still I wouldn’t be surprise if the technology can rapidly evolve towards that direction, producing something much more complex from simple text prompts.
I hope this like this little piece.
Leave a comment