Do we really need to “talk” to machines?

“We treat computers like indentured servitude right now, and we need to actually take them as pieces of society and treat them that way.” (Richard Sutton)

Two years ago I wrote a blog post titled “talking to software”. The idea of “conversational commerce” was getting increasingly popular and more and more people were looking at it as new global pattern in human-computer interaction.

Fast forward to today and the notion has gone mainstream. Screening through the multitude of prediction posts for the year that just started it will be difficult to miss a references to voice computing as the next big thing.

There is something I find unsatisfactory about all this enthusiasm. It goes beyond the still primitive intelligence of bots and the resulting clunky conversations. More and more, I have been asking myself whether our infatuation with speech as the ultimate human-machine interaction model isn’t more a projection of our intelligence and, in fact, a limitation that prevents us to explore new possibilities.

Does talking to machine have to look like “talking”?

A primitive mean of communication


Typewriters were (allegedly) made slower to avoid jamming

Human language is capable of carrying large quantity of information, more than any other form of communication observed in other animal species.

While this has allowed humans to dominate the planet, it doesn’t mean we are immune from the constraints imposed by the biology of our body and the physics of our environment[1].

In fact, comparative studies have demonstrated that there is somehow a fixed ratio between the speed and density of different languages [2]. In other words, there is a finite amount of information speech can carry. Translating this finding to a familiar analogy we can say that our language is similar to a QWERTY keyboard, artificially designed to slow us down and prevent “jamming”.

Exploring options


 Dr. Louise Banks translates heptapods writing in the movie Arrival, adapted from “Story of your life” by Ted Chiang

For them, speech was a bottleneck because it required that one word follow another sequentially. With writing, on the other hand, every mark on a page was visible simultaneously. Why constrain writing with a glottographic straightjacket, demanding that it be just as sequential as speech? It would never occur to them. (Dr. Louise Banks)

The first realm of possibilities is in the way we communicate. Just like the heptapods in Arrival, computers have no reason to be bound to the same constraints that shape our language. The analogy between the alien way of writing and a computer program is not accidental. Both need to be thought out in their entirety before being communicated. Forcing us to think in advance creates a new set of constraints, in particularly on the sender, but has the obvious benefits of improving clarity and reducing room for misunderstanding. Imagine being able to convey an entire complex thought, at once, to another person.

Moreover, recent developments in computer vision have also widened the scope of possible communication forms. From the “basic” use of QR codes, and now stickers, to the ability to recognise and process everyday objects, machines will soon see and interpret things around them well beyond specific commands. When our own vocabulary is being augmented by the use of memes why shouldn’t computers be allowed to do the same?

There is then the subject of our communication: the “what” we talk about. Most, if not all, examples of voice-based interfaces confine communication to giving instructions or asking questions. But this is just a subset of the many things we tell to (and we learn from) machines.

One of the best examples of human-machine communication is reCAPTCHA. Through massive-scale online collaboration, humans help machines digitising (and then translating) human knowledge. The project exploits the complementarity of skills in humans and computers to achieve an outcome that neither could reach alone. Typing a street number or a blurry word and translating ambiguous sentences is in many ways a more native form of human-machine communication than asking Alexa about the weather.


reCAPTCHA: a native form of human-machine communication

In a similar vein, Ines Montani from has been advocating for a more thoughtful (and designed) approach to how we interact with computers in the context of AI. She focuses on three categories: data collection and training, demonstration and education, debugging and iteration.

Training datasets, in particular, is becoming more and more important. However, we tend to look down at it as a low level task that poorly paid, unmotivated workers need to perform. We would rather impart commands to a machine (our servant) than spend proper resources to train it (our partner).

“Ultimately, what we’re trying to do is have a human teach things to the computer.” (Ines Montani)

If teaching a machine is our goal, is talking really the best way to do that? Ines gives instead the example of creating games as a form of human-machine communication, thus taking the idea of reCAPTCHA one step further.

Thinking more broadly about the ways and whys we communicate with machines should stimulate us to develop better languages, interfaces and syntaxes. It will push us to take into account what’s possible (and even desired) from a machine point of view, instead of focusing on “imparting orders”, which inevitable puts us, and our priorities, at the centre of the communication process.


There is an obvious advantage in talking to machine through our own human language: it is easy.

From the early days, advances in speech recognition have been pushed by the the desire to make the power of complex computers accessible to users without burdening them with the need to acquire new knowledge.

Convenience is a powerful force. But it is not convenience that brought us to where we are. I would rather see us pushing our boundaries and start exploring what talking to machine really can bring us.


[1] Modern linguistic has rejected the strict hypothesis that our way of thinking is shaped by how we speak. Nevertheless, speech it is still the best way we know to transfer our thoughts (our “mentalese)” to other people. This means our thoughts need to bend themselves to the rules of acoustic, the physics of sound waves and the biology of our vocal chords.

[2] “It seems that humans may be naturally and universally self-regulating when it comes to communicating through speech. There is a balance that cannot be disturbed: fast syllables are not allowed to carry too much meaning, and syllables with lots of information must be spoken slowly.” A cross-language perspective on speech information rate

[3] This can also extend to entire platform or networks. John Bortwick noticed recently about political bots: “As people understand that accounts aren’t necessarily human, they will start to trust platforms and networks less.”

Thanks to Sebastian Stockmarr.

One thought on “Do we really need to “talk” to machines?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s