Note: This essay was originally posted by Dave Vronay on Medium on the 2nd of June, 2025.
Recap
In Part 1 of this series, we talked about the types of human-computer interactions. We talked about tools — things that allow the user to make changes in the world — and agents— things that the user has a conversation with (and which might make changes as a result). We also discussed how some agents, called affectants, are capable of participating in authentic emotional interactions with humans while others, known as actors, can participate structurally in conversation but cannot share emotions. We also discussed why is it so important for the mental and emotional health of our users to not encourage them to respond emotionally to actors who cannot reciprocate.
In this second half, we will discuss how we can combine the various interaction models together to make experiences that are better than either on their own.
From Dialogs to Conversations
In traditional GUI, we use dialog boxes when we need to have complicated conversations with the user. For instance, in order to print a document, we need to know a bunch of stuff from the user — which printer, how many copies, if they want it single or double spaced, and so on. And then since printing could fail, we need to let the user know whether we completed it or not.
We traditionally do this with a print dialog box, with direct manipulation controls that allowed the use to see and change all of these settings.
A standard “print document dialogue box”.
This could be done just as easily by asking an AI agent to print a document. Telling the agent to “print four copies of this double-sided” is sufficient to get the job done.
And while the print conversation is simple enough that AI doesn’t seem to add anything, not all conversations are so simple.
Consider the case of a drawing program, where the user wants to arrange a collection of objects. The user might want to align or distribute them vertically or horizontally, on their tops, bottoms, or centers, etc. We can add these direct manipulation controls to a dialog box:
But what if the user wants to control the padding between the objects? Or distribute them radially? Or radially but without rotating them? Each new thing we add requires adding more and more UI to the dialog box. Eventually they become overwhelming and unwieldy.
By using an LLM agent, the user can have a complex conversation like “Arrange these objects in a circle like the numbers of a clock. Color them based on their position on the circle like the colors from a color wheel with red at the 12 o’clock position”. Even though the underlying functionality might be the same as with a dialog box — API calls for moving, rotating, coloring objects — the user can exploit the power of language to accomplish goals that are arbitrarily complex and obscure. We don’t need to make it “one-size-fits-most”.
Conversational UI is also very powerful when the user is not an expert on how a system works. Imagine I am in a Zoom meeting with you and I want you to hear the funny growling noises my dog is making, but you can’t hear it because Zoom is suppressing background noise. There is a setting I can change to turn that off, but I might not know that the setting exists or where that setting is. Even if I find it, I might not understand that the setting will do what I want — especially since Zoom calls this “original sound for musicians”. The failure rate for these kinds of tasks in conventional UI is pretty high.
But using conversational UI, I can enter my intent — “make it so the other people can hear my dog in this call” — and an agent can not only tell me the correct setting to use and where to find it, it could even directly change it on my behalf.
Discoverability vs. Flexibility
So does that mean we can replace all traditional conversational UI with LLM chat? Not really. There is a fundamental trade-off between discoverability and flexibility.
Conversation — whether spoken or written — relies on something we call a shared domain: the overlapping knowledge, vocabulary, and conceptual understanding between participants. When you talk to a friend about movies, you share a domain that includes genres, actors, directors, and common phrases like “plot twist” or “character development.” The larger this shared domain, the more efficient and nuanced your conversation can be.
This same principle applies to human-AI interactions. The effectiveness of conversational UI depends heavily on how much domain knowledge the user and the AI share. Let’s look at three scenarios:
Large shared domain: In an electronic medical record system, both the LLM and the doctor share extensive medical knowledge — terminology, procedures, diagnostic criteria. The doctor can make complex requests like “Show me all patients under 40 with elevated A1C but no diabetes diagnosis in the past year” and expect accurate results. The shared vocabulary and concepts make conversation highly efficient.
Chat interfaces excel in use cases where there is a complex but shared domain between the LLM and the user.
Partial shared domain: Consider a photo editing app. A professional photographer shares more domain knowledge with the AI (“adjust the histogram,” “increase color grading in the shadows”) than a casual user who might not know these terms exist. The professional can leverage conversational UI effectively, while the casual user might not even know what to ask for.
Minimal shared domain: For a user who’s never used encryption before, telling an AI to “secure my email” is ambiguous. Should it use S/MIME or PGP? What level of encryption? The user doesn’t share enough domain knowledge to have an effective conversation about their needs.
This is why dialog boxes and graphical interfaces remain valuable — they create shared domain by making options visible. When users see choices like “Print double-sided” or “Color/Black & white,” they instantly understand possibilities they might never have thought to request.
Discovery
If you suspect that your user does not have a shared domain, you need to provide some kind of discovery mechanism. Dialog boxes do this automatically. They explicitly tell the user what is possible. For instance, if a user did not know that it is possible to print in color or print double-sided, they might never ask a chatbot to do that, but they would see those options in the dialog box.
We sometimes try to do this with chat interfaces as well. A voice agent might say “What can I help you with today? You can say things like ‘check my balance’ or ‘stop a payment’”. With text chat, we often do this by showing prompts:
Prompts like this are more effective when there is more context. From an open chat with an LLM, a user could be asking almost anything. But if we are in a more confined situation — say, we have a text turn selected in an online chat — we can have more targeted (and helpful) actions, like “Translate to English” or “Summarize thread”.
Limits of Discoverability
While graphical UI in general is more discoverable than conversational UI, it is by no means guaranteed. Even well-designed dialog boxes and graphical interfaces can fail to communicate effectively with users. Here are three common ways discoverability breaks down:
First — if the user is not versed in the domain, the UI choices might still be opaque to them. This is where your content designer should be aligning the concept space of the user with that of the application, or providing tooltips or other helpful explanations to technical terms.
Zoom’s Audio profile settings show explanatory text, tooltips, and even hyperlinks to additional documentation.
Second, if there are many options, the user might be overloaded. In general, any dialog should limit the number of options exposed. You should design your information architecture so that lesser-used or more obscured settings are behind some additional UI action. For example, in the print dialog box shown earlier, the lesser-used options like paper size, margins, and scale are collapsed by default under a “more settings” accordian.
Ideally, the label for that action would indicate which type of user it is intended for. For example, if your UI had a large numnber of audio equalization and tuning settings that are intended for musicians, you could put those options behind a button called “Musician settings”. Then you could design the UI on that page specifically for a musician shared domain.
Finally, one must remember that learning about a function or activity is different from performing that function. Designers often conflate these two. Not only do we expect the user to “read” the UI extensively in order to use it correctly, we also do not provide any way to learn WITHOUT accessing the UI. Consider a UI like the following, that is shown when the user is trying to send an email:
If the user is not aware of encryption, they are unlikely to be successful in this UI. The text in the UI makes this worse. How secure is “less secure”? If “none” is not recommended, why is it even an option? Exactly which features may not work?
While there is a link there to “learn more about encryption”, the user may not have time for that now. And yet — later, when they have time, they may have no idea how to get back to that link. For complex software, especially enterprise software, it is better to address these with some kind of structured learning or discovery center, such as Salesforce’s Trailhead:
The Best of Both Worlds: From Pre-Made to Just-in-Time
Sometimes having a conversation is slower than using direct manipulation UI. If I am not sure how much space I want between objects, it is easier to tap an up arrow next to a measurement than it is to keep telling an agent “a little more… a little more… no back a bit…”, etc.
In these cases, a hybrid solution might be best. The user can initiate an action conversationally, and the AI agent can generate a refinement UI just in time, just for that single request. For example, if I say “arrange these objects in a circle”, it could arrange them, but it could also generate a little UI that allows me to adjust the radius, rotation, and center position.
This approach allows for bespoke solutions that combine the creativity and linguistic power of the LLMs with the efficiency of present-at-hand solutions. LLMs are already great at understanding how to build UI solutions out of design system components, and also have good knowledge of what parameters a user is likely to want to adjust. Here is a small example from a whiteboard app with a stock LLM (no additional training), with the LLM’s thinking shown:
A simple action that requires no additional UI.
A request that could benefit from additional UI questions.
This pattern can give the user the ability to use language to handle complexity but use direct manipulation to handle adjusting and refinement.
Consider scheduling a meeting. If I tell an AI assistant to “schedule a weekly check-in with the design team,” it could certainly create a meeting and invite the right people, and even find compatible time slots. But what time? How long? Which specific team members are required vs. optional? Rather than engage in a tedious back-and-forth (“How about Tuesdays at 2pm?” “No, make it 3pm” “Actually, make it 30 minutes instead of an hour”), the AI could schedule an initial meeting and simultaneously generate a simple refinement interface. This might include quick toggles for duration (15/30/45/60 min), a day/time picker that shows team availability, checkboxes for optional attendees, and a dropdown for recurrence patterns (weekly, bi-weekly, monthly). The user gets the convenience of expressing their intent naturally — “schedule a weekly check-in” — while retaining the efficiency of direct manipulation for the inevitable adjustments. This approach respects both the AI’s strength in understanding complex intent and the user’s need for quick, precise control over the details.
Partnership: Beyond Right-to-Left and Left-to-Right
We have seen examples of the AI agent producing content from a chat conversation— such as writing an email — as well as content being fed into the agent for additional actions — such as arranging shapes. Because many of these UIs show the agent on one side and the content on the other, we often call these unified solutions left-to-right and right-to-left flows. Optimizing these flows back and forth is still nascent, and is one of the areas where more design innovation is needed. Most of the current solutions are still clunky and inefficient.
It is best to think of an AI agent as the user’s collaborator, and if we do so, we can applied long-tested rules of computer-supported collaborative work (CSCW) to the design of an effective interface. Here is a checklist of some principles to follow when designing such interactions, informed by the work of researchers like Jonathan Grudin and others back in the early days of HCI.
Establish and maintain common ground — help the user understand what the AI knows, assumes, or intends. Clearly indicating the context, showing the AI thinking, and allowing the AI to express confidence ratings all helps with this. The more the user understands the agent, the greater the trust.
Support awareness, intelligibility and legibility — CSCW has long emphasized the importance of letting the user know what other users are doing. This is especially important with autonomous agents who act without direct user control. Being able to see where the agent is working, get activity traces, and explanations of actions all helps with this.
Manage articulation, initiative and turn-taking — the system should have clear rules on who is doing what, and what the AI is and is not allowed to do without confirmation or approval. Being too passive makes the AI a useless assistant, while being too aggressive — even if accurate — can cause stress. It should also be aware of what the human is doing so that it does not interfere. Having patterns like “I write, you polish” or “I write the schema, you implement the resolvers” can make this easier.
Support repair, recovery, and refinement — one the the best features of LLMs is that they are great with refinement. If they don’t do exactly what you wanted the first time, you can continue the conversation to get it closer. When the AI generates a summary, you could tell it to make it longer or shorter, more or less technical, make bullet points, exact action items, etc. And yet, many times we leave this capability out when doing left-to-right or right-to-left experiences. Likewise, being able to repair what an agent did incorrectly or recover previous versions encourages greater utilization of the agent and increases trust.
Respect Human needs and goals — Much has been written in CSCW literature about the importance of remembering that people are social and emotional creatures, and how this impacts any collaboration. While AI agents are not affectants with emotions of their own (not yet, anyway), we still need to consider the effect their actions might have on their human collaborator. Understanding the goals of your user, their hopes, fears, concerns, etc., can help ensure you do not produce an ineffective agent. While this is not fundamentally different than the need for empathy in any design task, there are complexities with collaboration between any agents with vastly different capabilities, goals, and understandings.
Conclusion
Understanding the distinction between objects and agents, actors and affectants allows us to design interactions that align with how humans naturally engage with the world. When we misapply these models — treating AI as a human, forcing people into mechanical interactions, or disrupting the intuitive flow of a tool — we create friction, frustration, and mistrust.
As designers, it is our responsibility to ensure that AI serves as an enabler rather than an obstacle, supporting communication rather than interfering with it. The challenge ahead is not just about making AI more powerful, but about making it more appropriate — deploying it where it enhances user experience while recognizing its limitations.
These principles become even more critical as we expand into voice interfaces and ensure accessible experiences for all users. Voice interactions introduce unique challenges around discoverability and error correction while being even more prone to emotional investment. And accessibility considerations may fundamentally alter when conversational UI helps versus hinders different users. The hybrid approaches discussed here — combining AI’s linguistic flexibility with the clarity of direct manipulation — may prove especially valuable in creating truly inclusive interfaces. In future articles, we’ll explore how these interaction models translate across different modalities and user needs.