Lessons learned from Building Callbot

Amédée Potier
5 min readSep 20, 2019

Last June we booked a 3 days trip to attend the Chatbot Summit in Tel-Aviv. We were used to the various AI events in Paris as exhibitors, but it was our first event outside Europe, and one of the rare we were not exhibiting at. We were pretty excited to attend, to meet our peers, and it has been great. First time I really got to speak to other chatbot software CTOs, (in particular it was great meeting Alan Nicole from Rasa and Yi Zhang from Rul.ai, both CTO and co-founders of two other interesting players) to openly discuss thoughts around the technology, the various approaches to handling conversation management, the market. Like in any field, after a few evenings out, the entire chatbot landscape became a thing about a friendly mix of techy folks and the surrounding ‘sales and marketing’ mates, trying to make a business out of this. We were all on the same battle, each trying to invent new ways to better models and manage conversations, continuously improving things collaboratively, each trying to be the next one.

Callbot for the dummies

Anyway, one of the most interesting topics covered there was callbots. The technology pieces are now in place, ready to be leveraged by the chatbot makers. The key building blocks are readily available (provided you have the right team to play with those blocks):

  • Speech to Text is well handled by a number of GAFAM actors (Microsoft, Google, Amazon) and other smaller players. There you can get access to easy to use and pretty cheap API access. These are now reaching levels of quality that actually overrun humans in terms of understanding capabilities!
  • Similarly, the Text to Speech part is also easily available from the GAFAM. You’ll be able to choose from a number of languages, accents, gender for voice generation.
  • A telephony gateway is required to connect the bot infrastructure with public telephony networks or internal business communication platform. In Tel Aviv, we recently met with AudioCodes, a company with a strong background in the telephony world and a substantial business around the Microsoft Azure and O365 ecosystem. AudioCodes have come up with a solution they call Voice.AI gateway that can connect chatbots to any telephony-based platform creating fully voice-enabled engagement channels. We chatted with Ilan and quickly after our return to France, started an evaluation.
  • · Unless you want to build a basic IVR (Interactive Voice Response), a good Chatbot technology is required. It needs to be flexible enough to accommodate to the constraints of voices, with a strong NLP (we’ll see later that a callbot needs good NLP, much more than what is required for a basic ‘button bot’!). Luckily, we had this part, having spent two years building a solid NLP stack, and not diving into callbots before mastering this NLP piece.

Implementation

My team has started to implement a callbot around our Kbot for IT Help Desk solution. This callbot’s mission is to assist enterprise users with their technical issues and it already integrates with webchat, skype, Microsoft team, etc. We are now developing callbot integration because we learned with our partners that very large support organizations still handle most of their contacts through traditional phone calls, handling these is becoming critical for their survival — and we think the future market winners will be those agile enough to adapt quickly to rapidly changing conversational means.

Now let’s illustrate this story. It is actually hard to put an interesting image of a callbot so I’ll share a small video of what was built over a sprint. That’s me as an actor (remember I am the CTO of a startup, with lots of freedom to produce… as long as it is very cost-effective).

This was the first try. For preparation, I did my best with the team to adapt the conversational path to make it fully conversational, ready for voice interaction, with no buttons, no long text, ensuring all content renderers (list, choice, confirmation, images, etc.) were properly handling such voice-over-phone constraints. Like always, it is when these new integrations are done that the platform flexibility is challenged, and platforms already designed to cope with numerous channels are in much better shape to master that.

Findings

One of the most interesting challenges for a callbot is to understand “when the user is done with its statement”. In human communication, body language plays a key part in a conversation flow. A phone conversation makes it harder, but we concentrate more and can then understand when the other is done and a response may be provided. Overall, humans are quite good at it, even if cell phones with their latency are often introducing this issue and making people used to it (there are much more callers collisions over a Skype or a satellite line than over a so-called direct line — well, these do not exist anymore in fact, the good old RTC lines are gone for good, no more direct pair of copper wire between two folks thousands of miles apart, that model didn’t scale!).

The typical way when to fire the TTS (Text to Speech) is to configure the duration of the “silence” a user will naturally produce at the end of its statement. Then the bot may answer. This silence during waiting is still noticeable on a bot conversation (and a bit too obvious in the above video). I am eager to see if the big AI labs will sort out this point in a deeper analysis of the user tone. A good tuning of this “silence duration” nevertheless provides very acceptable results: the human will obviously understand he is talking to a robot, and the point there is not to fool anyone.

Conclusion of the day

The work must continue, callbots are a rich topic. We are now working on much more advanced scenarios and integrations, designing the multilingual aspect, preparing for the transfer to live agents, building key conversational use cases that can produce a strong ROI to our partners and customers.

It seems the day is really not far away when I’ll be able to call my dev bot on my handset during my daily morning bicycle ride, asking it to boot my work VM, to start my development bots, to list my tasks of the day, to refresh and rebuild my various git repositories, to book some meetings with my engineers, to put various comments on tickets…

Each industry will define its key use cases. We’ll see more and more people in cars, on the street, at work, chatting on their phone or headset with their personal assistant, to get things done quickly and effectively.

We’ll keep you posted on our progress. Stay tuned fellow Chatbot-curious readers.

--

--

Amédée Potier

My core passion is building softwares. I am now CTO and co-founder of Konverso, a startup building virtual assistants powered by AI and NLP (www.konverso.ai)