I’ve been playing around with the idea of doing something with speech interaction in my PhD work for some time now. I will probably spend some time in the next period thinking about how to do that – as an exploratory, more playful stab at figuring out speech interaction, I’ve put together a short evaluation of my own experiences with Siri, the voice assistant on my iPad.
Let’s take a look at one of my interactions with Siri:
1 [Press home button]
2 [Siri bell sound, text: ‘What can I help you with?]
3 Me: How are you today, Siri?
4 [Siri high bell sound]
5 Siri: I’m pretty good, Jan!
6 [Press home button again]
7 [Siri bell sound]
8 Me: Siri, I have a new appointment.
9 [Siri high bell sound]
10 Siri: When is your appointment?
11 [Siri bell sound]
12 It’s tomorrow.
13 [Siri high bell sound]
14 Siri: OK, what time is your appointment?
15 [Siri bell sound]
16 Me: It’s at twelve.
17 [Siri high bell sound]
18 Siri: OK I set up your appointment for Friday. Shall I schedule it?
19 Me: No, don’t schedule it.
20 [Siri high bell sound]
21 Siri: To continue, you can Confirm, Cancel, Change the Time, or Change the Title.
22 [Siri bell sound]
23 Me: Just stop!
24 [Siri high bell sound]
25 Siri: I don’t understand ‘just stop’.
26 Siri: To continue, you can Confirm, Cancel, Change the Time, or Change the Title.
27 [Siri bell sound]
28 Me: Cancel.
29 [Siri high bell sound]
30 Siri: OK. You’re probably way too busy anyway, Jan.
In the short analysis below, I will explore my interaction with Siri. Specifically, I focused on to what extent Siri presents itself (herself?) as an anthropomorphic assistant, obediently listening to the questions of its (her) users. The way Siri responds to questions, but also the way it handles conversation and communicative breakdowns is important in establishing an image as either a machine, or an anthropomorphic personal assistant.
I’m especially interested in will focus on the ‘interpersonal function’ of the language and interaction – looking into the particular role adopted by Siri, and how it sets up a relationship with the user.
Siri can make small talk, to some extent (see line 5). Although Siri is aimed primarily at responding to practical requests (opening apps, checking the weather, etc.), Siri politely responds to phatic communication: this type of communication makes Siri appear friendly, and more human: it adds to the anthropomorphic image Siri wants to project.
Interpersonal communication is not only about greetings and politeness, it is also relational, and incorporates conversational turn-taking. After a first polite reaction to the greeting, Siri abruptly cuts off the conversation (lines 4-5). In order to continue, Siri needs to be activated again (line 6). This cutoff is characteristic of the rest of the communication: Siri is in charge here. While the user initiates the communication, Siri takes the initiative in structuring the conversation afterwards, not the user. Siri decides whose turn it is to speak. When ‘activating’ Siri, you hear a characteristic bell sound. This type of bell sound is repeated each time Siri is ‘listening’ for user input, or stops listening. It is used to signal when Siri expects user input (e.g., lines 11 and 15), when a user is allowed or expected to speak. If users ignore Siri’s cues, Siri – literally – doesn’t listen anymore.
Recovering gracefully from errors and misunderstandings is obviously important in interaction with a system: in a conversation, this means renegotiating the meaning of what has been said. However, the misunderstanding starting on lines 19-21 shows that it is quite impossible to negotiate meaning with Siri. Siri does offer another way out- of the misunderstanding though, but this way out puts Siri even in a more dominant position. Siri not only decides whose turn it is to speak, but also decides on the options available to the user: Confirm, Cancel, etc. (see lines 21 and 26). When trying to respond to Siri without using the predefined options (line 23), Siri stubbornly keeps repeating the same question, hijacking the entire conversation.
Siri is intended to be an obedient voice assistant: this is apparent both in its design (Siri – textually – asks ‘What can I help you with?’ as a first statement), and in the way it is marketed (‘Your wish is its command’). However, while Siri takes on a servient role in its willingness to answer questions, the above analysis shows that Siri, at the same time, is very dominant in structuring the conversations. Siri, as an obedient servant that also dominantly forces its users into a strict dialogue structure, presents itself in a rather ambiguous way.
Siri’s dominant nature, however, seems related to design solutions for technical problems. The bell sounds, structuring the turn-taking, are used as a solution to give users feedback on when Siri is listening, and when it isn’t, as Siri technically can’t be ‘listening’ all the time. After an error or a misunderstanding, Siri reverts to a very strict, inflexible way of communicating, designed to find the fastest way out of the misunderstanding. However, this strict way of communicating, and imposing the turn-taking pattern make Siri more dominant, and detract from Siri’s image as an assistant. There’s a tension between what Siri is saying and what Siri is doing (inflexibly guiding communication).
These are, of course, just some thoughts based on one ‘conversation’ with Siri. I wonder, though, how ‘actual’ users perceive this kind of communication…