Meta has an "open" implementation of the viral generate-a-podcast feature in Google's NotebookLM.
The project, called NotebookLlama, uses Meta's own Llama models for much of the processing, unsurprisingly. Like NotebookLM, it can generate back-and-forth, podcast-style digests of text files uploaded to it.
NotebookLlama first creates a transcript from a file – say, a PDF of a news article or blog post. Then it adds "more dramatization" and interruptions before feeding the transcript to open text-to-speech models.
But they don't sound nearly as good as NotebookLM. In the NotebookLlama samples I have listened to, voices had a very obvious robotic quality to them and seemed to talk over each other at odd points.
The researchers at Meta behind the project say that stronger models could still potentially make it quality worthy.
"The [text-to-speech] model is the limitation of how natural this will sound," they wrote on NotebookLlama's GitHub page. "[Also,] another approach of writing the podcast would be having two agents debate the topic of interest and write the podcast outline. Right now we use a single model to write the podcast outline."
NotebookLlama is not the first stab at duplicating NotebookLM's podcasting feature. Some have been better than others. But none-not even NotebookLM itself-have cracked the hallucination problem that plagues all AI: namely, podcasts generated by AI are going to contain some made-up stuff.