Speak4Me is a versatile text-to-speech (TTS) platform that can convert text into audio. It offers various features like voice selection, speed and pitch control, audio export options, and API integration. It’s useful for accessibility, education, productivity, and entertainment purposes. For instance, it can help visually impaired students, educators create audiobooks, businesses build chatbots, and podcasters convert text to audio. It’s a reliable and user-friendly tool for generating spoken content from text.


 

Chris: Hi there, please introduce yourself.

Riccardo: Hello, my name is Riccardo Oliva, and I’m one of the four co-founders of Bakery Scent. I have over 5 years of experience in mobile apps development. Although I started as a designer, now I take care of everything that concerns the products we develop as a company.

Chris: What inspired the creation of Speak4Me, and what specific challenges or needs in the field of text-to-speech (TTS) did you aim to address with this platform?

Riccardo: At Bakery Scent we’re focusing on developing softwares in the field of Education. Our goal is to build products that are accessible to anyone, and that could help our users to study better and more efficiently.

That’s why we decided to work on Speak4Me. An ethical use of machine learning is an advantage when it comes to building creative products aimed at improving students and young professionals’ experience. Quite often, users are not even aware of the existence of products like Speak4Me. Working on a text-to-speech app has certainly been interesting as much as challenging. Development and integration of in-app TTS models has required lots of research and testing that have granted us a good result. And now we’re working hard to improve it.

Chris: The ability to convert text into audio is the core of Speak4Me. Can you explain in more detail how the platform achieves this, and what sets it apart from other TTS solutions in terms of accuracy and naturalness?

Riccardo: We worked really hard in order to provide an almost real-time answer when generating audio. We tried different models available on the market, readapting them to our needs. The second obstacle concerns the generation costs. We couldn’t rely on Cloud-based solutions because they’re too expensive when it comes to reading long texts.

After several attempts, we managed to find and train a model, which we then readapt, lightened and exported in ONNX to finally upload it in the app. This model, or these models (we have one for each voice) allows us to keep costs down to almost zero. Also, it works very fast in the app and with any kind of iPhone, and especially, it works offline too. This is different from all our competitors. Our solution is a good compromise between quality and costs, but we’re actively working to improve the in-app performance as well as the audio quality. It’s going to be a continuous project, and we’re just getting started!

Chris: Voice selection is an interesting feature. How do users access a variety of voices, accents, and synthetic voices within Speak4Me, and what considerations went into the selection of available options?

Riccardo: The models need to be trained on the datasets. To put it simple, it’s like we had to teach these models how to speak. The first step to create a voice is to teach the language to it. We train each new model for around a week until the process is completed. Starting from that model, we then generate other voices through a process called fine-tuning. Each new voice needs around 3 to 4 days to learn how to change pronunciation and tone. We work on Speak4Me every day to generate new voices in English as well as other languages. We’re going to release new Italian voices soon, which we’re very proud of as it’s our team’s native language. Moreover, users can choose the narrating voice they prefer. Once selected, they are able to listen to any document with their voice through the app.

Chris: Users being able to adjust speed and pitch can greatly impact the listening experience. Could you provide some insights into the level of customization and flexibility users have in this regard?

Riccardo: Being able to adjust the speed is one of the main features. Users can increase it up to 2x in order to speed up their reading of school material, as well as PDFs, websites or even scanned physical books.

Chris: Education is a diverse field, and TTS can be a valuable tool. Can you provide examples of how Speak4Me has been used to create educational materials, and what feedback have you received from educators or students using the platform?

Riccardo: Our users use Speak4Me to study and listen to their files. One of the most liked features is certainly the Chat. We created “ChatWithMe”, a feature that allows users to ask any question to their documents, as well as ask for summaries, and then listen to the answers. We’ve been testing it for a long time and it seems it can double the learning speed.

Here are some reviews from our users:
– “Love it, I can read publications while making breakfast and housework and sports”
– “Best purchase of the year so far! Really made a difference in my studying and my productivity levels – highly recommended!”.

In addition, the app helps users who struggle to speak. Here’s a review that really impressed us:
– “I can create my own verbal notes to help me accomplish the things that I need to do by verbal means. I forgot to mention on the last question post concussion syndrome that affects the ability to communicate verbally. For a long time I struggled to communicate. I knew what I wanted to say but it got lost on the way from my brain to my mouth. For a long time I couldn’t communicate verbally and nobody felt like reading everything I wrote. Apps like this have been a big help to me. Not being able to communicate led me to feeling suicidal so I would say that these types of apps are literally life savers! ”

Chris: In a rapidly evolving technology landscape, how does Speak4Me envision staying current and continuing to meet the changing needs and expectations of users who rely on TTS technology?

Riccardo: This is a very positive aspect for us. The ongoing development of new technologies allows us to improve the app in order to provide our users with an increasingly better quality. We hope one day to reach a level as high as podcasts or audiobook apps in terms of reading experience.

Chris: Audio export options are important for users looking to create content with the generated audio. What formats are supported for audio export in Speak4Me, and how can users make the most of this feature?

Riccardo: Although it’s not our primary focus, we support audio export in wav format. This allows our users to use the audio file for whatever type of content: from Tiktok videos, to WhatsApp pranks.

Chris: Looking ahead, are there any exciting updates or developments in the pipeline for Speak4Me that users can look forward to?

Riccardo: We have quite a few interesting features on our roadmap, but for now, we’re mostly focusing on improving Speak4Me’s voices. In addition to that, we’re studying new functionalities that could help our users to learn foreign languages.

Chris: Thanks for being with me, any last words? Where can our readers follow you?

Riccardo: Thanks for interviewing me and for your interest in Speak4Me. We have a website that we regularly update (www.speak4me.io), and you can follow us on social media, specifically on TikTok, Instagram and YouTube. We’re just getting started with a bit of content creation together with Victoria, a UC Berkeley student.