Hi Guys!
Today, we’ll be discussing one new post of converting text into a voice using some third-party APIs. This is particularly very useful in many such cases, where you can use this method to get more realistic communication.
There are many such providers, where you can get an almost realistic voice for both males & females. However, most of them are subscription-based. So, you have to be very careful about your budget & how to proceed.
For testing purposes, I’ll be using voice.org to simulate this.
Let’s look out the architecture of this process –

As you can see, the user-initiated the application & provide some input in the form of plain text. Once the data is given, the app will send it to the third-party API for the process. Now, the Third-party API will verify the authentication & then it will check all the associate parameters before it starting to generate the audio response. After that, it will send the payload & that will be received by the calling python application. Here, it will be decoded & create the audio file & finally, that will be played at the invoking computer.
This third-party API has lots of limitations. However, they are giving you the platform to test your concept.
As of now, they support the following languages – English, Chinese, Catalan, French, Finnish, Dutch, Danish, German, Italian, Japanese, Korean, Polish, Norwegian, Portuguese, Russian, Spanish & Sweedish.
In our case, we’ll be checking with English.
To work with this, you need to have the following modules installed in python –
- playsound
- requests
- base64
Let’s see the directory structure –

Again, we are not going to discuss any script, which we’ve already discussed here.
Hence, we’re skipping clsL.py here.
1. clsConfig.py (This script contains all the parameters of the server.)
############################################## #### Written By: SATYAKI DE #### #### Written On: 12-Oct-2019 #### #### #### #### Objective: This script is a config #### #### file, contains all the keys for #### #### azure cosmos db. Application will #### #### process these information & perform #### #### various CRUD operation on Cosmos DB. #### ############################################## import os import platform as pl class clsConfig(object): Curr_Path = os.path.dirname(os.path.realpath(__file__)) os_det = pl.system() if os_det == "Windows": sep = '\\' else: sep = '/' config = { 'APP_ID': 1, 'url': "https://voicerss-text-to-speech.p.rapidapi.com/", 'host': "voicerss-text-to-speech.p.rapidapi.com", 'api_key': "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", 'targetFile': "Bot_decode.mp3", 'pitch_speed': "-6", 'bot_language': "en-us", 'audio_type': "mp3", 'audio_freq': "22khz_8bit_stereo", 'query_string_api': "hhhhhhhhhhhhhhhhhhhhhhhhhhhh", 'b64_encoding': True, 'APP_DESC_1': 'Text to voice conversion.', 'DEBUG_IND': 'N', 'INIT_PATH': Curr_Path, 'LOG_PATH': Curr_Path + sep + 'log' + sep }
For security reasons, sensitive information masked with the dummy value.
‘api_key’: “xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx”,
‘query_string_api’: “hhhhhhhhhhhhhhhhhhhhhhhhhhhh”,
This two information is private to each subscriber. Hence, I’ve removed them & updated with some dummy values.
You have to fill-up with your subscribed information.
2. clsText2Voice.py (This script will convert the text data into an audio file using a GET API request from the third-party API & then play that using the web media player.)
############################################### #### Written By: SATYAKI DE #### #### Written On: 27-Oct-2019 #### #### Modified On 27-Oct-2019 #### #### #### #### Objective: Main class converting #### #### text to voice using third-party API. #### ############################################### from playsound import playsound import requests import base64 from clsConfig import clsConfig as cf class clsText2Voice: def __init__(self): self.url = cf.config['url'] self.api_key = cf.config['api_key'] self.targetFile = cf.config['targetFile'] self.pitch_speed = cf.config['pitch_speed'] self.bot_language = cf.config['bot_language'] self.audio_type = cf.config['audio_type'] self.audio_freq = cf.config['audio_freq'] self.b64_encoding = cf.config['b64_encoding'] self.query_string_api = cf.config['query_string_api'] self.host = cf.config['host'] def getAudio(self, srcString): try: url = self.url api_key = self.api_key tarFile = self.targetFile pitch_speed = self.pitch_speed bot_language = self.bot_language audio_type = self.audio_type audio_freq = self.audio_freq b64_encoding = self.b64_encoding query_string_api = self.query_string_api host = self.host querystring = { "r": pitch_speed, "c": audio_type, "f": audio_freq, "src": srcString, "hl": bot_language, "key": query_string_api, "b64": b64_encoding } headers = { 'x-rapidapi-host': host, 'x-rapidapi-key': api_key } response = requests.request("GET", url, headers=headers, params=querystring) # Converting to MP3 targetFile = tarFile mp3File_64_decode = base64.decodebytes(bytes(response.text, encoding="utf-8")) mp3File_result = open(targetFile, 'wb') # create a writable mp3File and write the decoding result mp3File_result.write(mp3File_64_decode) mp3File_result.close() playsound(targetFile) return 0 except Exception as e: x = str(e) print('Error: ', x) return 1
Few crucial lines from the above script –
querystring = { "r": pitch_speed, "c": audio_type, "f": audio_freq, "src": srcString, "hl": bot_language, "key": query_string_api, "b64": b64_encoding }
You can configure the voice of the audio by adjusting all the configurations. And, the text content will receive at srcString. So, whatever user will be typing that will be directly captured here & form the JSON payload accordingly.
response = requests.request("GET", url, headers=headers, params=querystring)
In this case, you will be receiving the audio file in the form of a base64 text file. Hence, you need to convert them back to the sound file by these following lines –
# Converting to MP3 targetFile = tarFile mp3File_64_decode = base64.decodebytes(bytes(response.text, encoding="utf-8")) mp3File_result = open(targetFile, 'wb') # create a writable mp3File and write the decoding result mp3File_result.write(mp3File_64_decode) mp3File_result.close()
As you can see that, we’ve extracted the response.text & then we’ve decoded that to byte object to form the mp3 sound file at the receiving end.
Once we have our mp3 file ready, the following line simply plays the audio record.
playsound(targetFile)
Thus you can hear the actual voice.
3. callText2Voice.py (This is the main script that will invoke the text to voice API & then playback the audio once it gets the response from the third-party API.)
############################################### #### Written By: SATYAKI DE #### #### Written On: 27-Oct-2019 #### #### Modified On 27-Oct-2019 #### #### #### #### Objective: Main class converting #### #### text to voice using third-party API. #### ############################################### from clsConfig import clsConfig as cf import clsL as cl import logging import datetime import clsText2Voice as ct # Disbling Warning def warn(*args, **kwargs): pass import warnings warnings.warn = warn var = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S") def main(): try: ret_2 = '' debug_ind = 'Y' general_log_path = str(cf.config['LOG_PATH']) # Enabling Logging Info logging.basicConfig(filename=general_log_path + 'consolidatedTwitter.log', level=logging.INFO) # Initiating Log Class l = cl.clsL() # Moving previous day log files to archive directory log_dir = cf.config['LOG_PATH'] tmpR0 = "*" * 157 logging.info(tmpR0) tmpR9 = 'Start Time: ' + str(var) logging.info(tmpR9) logging.info(tmpR0) print("Log Directory::", log_dir) tmpR1 = 'Log Directory::' + log_dir logging.info(tmpR1) # Query using parameters rawQry = str(input('Enter your string:')) x1 = ct.clsText2Voice() ret_2 = x1.getAudio(rawQry) if ret_2 == 0: print("Successfully converted from text to voice!") logging.info("Successfully converted from text to voice!") print("*" * 157) logging.info(tmpR0) else: print("Successfuly converted!") logging.info("Successfuly converted!") print("*" * 157) logging.info(tmpR0) print("*" * 157) logging.info(tmpR0) tmpR10 = 'End Time: ' + str(var) logging.info(tmpR10) logging.info(tmpR0) except ValueError: print("No relevant data to proceed!") logging.info("No relevant data to proceed!") except Exception as e: print("Top level Error: args:{0}, message{1}".format(e.args, e.message)) if __name__ == "__main__": main()
Essential lines from the above script –
# Query using parameters rawQry = str(input('Enter your string:')) x1 = ct.clsText2Voice() ret_2 = x1.getAudio(rawQry)
As you can see, here the user will be passing the text content, which will be given to our class & then it will project the audio sound of that text.
Let’s see how it runs –
Input Text: Welcome to Satyaki De’s blog. This site mainly deals with the Python, SQL from different DBs & many useful areas from the leading cloud providers.
And, here is the run command under Windows OS looks like –

And, please find the sample voice that it generates –
So, We’ve done it! 😀
Let us know your comment on this.
So, we’ll come out with another exciting post in the coming days!
N.B.: This is demonstrated for RnD/study purposes. All the data posted here are representational data & available over the internet.
You must be logged in to post a comment.