What is a conversation?
This question might be silly to ask. Of course you know what a conversation is, right? If you're reading this, chances are you've had a conversation or two during your life. While it might be strange to try answering this question, since you get what a conversation is, you might be surprised at how difficult it is to concisely summarize it in general terms. However, this will work to further our understanding on how voice assistants work in general.
The simplest description would sound something like this: a conversation is the exchange of dialogue between at least 2 parties, where each party takes turns in presenting information. The turns are exchanged based on certain cues, such as prompts, pauses in phrases (begging the question so to speak), etc.
When you think about conversations, you probably think about talking with your friends, loved ones, or generally any person you're close to and have known for a while. Conversations in such a context flow effortlessly because you know how to speak to the other person and how the other person will speak to you. You can instinctively pick up on cues that indicate whose turn it is to speak.
Now think about talking to a complete stranger, perhaps at a job interview. You'd probably describe conversations like that as "stiff". The cues are very rigid and explicit. There's no bleeding of information from one phase to another. When you're asked a question, you give the relevant details. No more, and no less. Usually.
This goes doubly so for voice assistants. When designing a conversation flow for a voice assistant, it could help to think about the other party your users will be interacting with as the stiffest, strictest job interviewer they'll ever meet. Topics need to be very clear, very concise, and the user should be prompted to give just enough information as is necessary for your skill to work.
The biggest part that is missing in voice assistant conversations are contextual cues and implicitness. There's really no easy way to figure out what the user wants to do on the fly. To remedy that, conversations are defined by very specific intents, and the variable data a user will provide is defined as a set of entities. We’ll go over what those are in a later chapter.
Outside of Convoworks
Now that you understand the theoretical side of conversational design, and how Convoworks expands on it, it's time to check out what happens outside of Convoworks. Don't worry, it's going to be really straightforward.
It all starts with the user and their device. This device can be a couple of different things — an Amazon Alexa, or something like a Viber bot. Regardless, when the users says (or writes) something, the device takes that input and passes it along to the vendor's main service. The vendor in this context meaning Amazon, Viber, etc. Your Convoworks service is configured in such a way that once the vendor figures out exactly what the user is saying, fills in all the relevant details, etc., this data gets passed to Convoworks that then runs the user's utterance through your skill. How the skill reacts is up to you of course. Then, Convoworks sends the skill's response back to the vendor, then to the vendor's device (or bot), so that the user can finally hear what the result is. Then the cycle resets and it's up to the user to continue or close the interaction.