Meta* releases an “open” version of the Google podcast generator
Meta* introduced an “open” implementation of a podcast creation feature called NotebookLlama for Google NotebookLM. This project is based on using Meta’s Llama models for data processing, which is quite expected. NotebookLlama, like NotebookLM, can create audio digests of user-uploaded text files, turning them into podcasts.
Work process NotebookLlama consists of several stages. First, a transcript is created for the downloaded file, whether it’s a PDF news article or a blog. After that, pauses and “dramatic elements” are added to the text, which gives the story a more lively character. In the final step, the text is fed into open speech synthesis models to obtain an audio version.
However, according to user reviews, the sound quality still needs better. In NotebookLlama’s demos, voices sound unnatural, and phrases sometimes break off in unexpected places. Meta* developers acknowledge this problem and explain that quality improvements are possible with more powerful models.
“Speech synthesis limits the naturalness of sound,” the researchers write on the NotebookLlama project’s GitHub page. “Another way to improve the quality of podcasts is to use two agents who would discuss the chosen topic and write the podcast plan together. We currently use one model to create this plan.
NotebookLlama is not the first attempt to recreate the podcasting feature offered in NotebookLM. The results of different projects vary, and each of them has not yet managed to completely get rid of the problem of “hallucinations” – a peculiar II phenomenon, when fictional details are added to the texts.
Source
*recognized as an extremist organization in the Russian Federation and banned