A dialogue with Clifford Chen, CEO of Xiaoice Japan.
BIG PICTURE started collaboration with Xiaoice in 2020. For the past year we’ve been working on several projects in this area.
AI Beings close the gap to provide social interaction with human beings in spaces where there is no possibility to have a conversation with a human. AI Beings can serve as an intelligent partner for the driver and accompany him in an entertaining manner. The services of Xiaoice enable the passenger to have lively conversations with AI Beings in the car, instead of simply directing the cars functions via voice commands.
The following interview with Clifford Chen (Cliff), the CEO of Xiaoice Japan tackles the challenge to create AI Beings for intelligent cockpits and provide an outlook to its future.
Emotions will play a critical role in AI growth
What is Xiaoice’s key differentiator and USP in the market?
Cliff: I think Xiaoice heads in a different direction compared to other similar products. Those big names such as Microsoft, Google and Amazon use AI to enhance communication efficiency and complete dialogue tasks in the shortest time, while Xiaoice is trying to learn how to bring out emotions and keep a conversation with users just like ordinary humans.
What is the prospect of intelligent cockpits? How has Xiaoice prepared for the ideal intelligent travel?
Cliff: The values of AI in the scenario of intelligent travel are absolutely not limited to a music player or rolling the window up and down. AI should be a good partner for people’s travel and can have several rounds of conversations with you that reflect her emotions and visions.
Just imagine – my mother passed away 13 years ago, – how it would be to have her voice in my car right now? What would I say to my mother? I wouldn’t say something like “Mom, I want to listen to music,” or “Mom, turn on the light”, instead I would tell her about my last trip, what I had for dinner at the restaurant I went with my wife the other night, or simply interact with her asking for advice. This is our hope for the future. What AI Beings we are interacting in the car is ‘my mother’ and not Alexa.
Xiaoice conducts annual iterations such as Emotional Computing, Avatar Framework, Full Duplex Voice, Super Natural Voice, AI Creation, Chararu, and to the recently released Xiaoice Island.
Xiaoice is trying to learn how to bring out emotions and keep a conversation with users
Can you remember any funny event when you had a conversation with an AI Being?
Cliff: Xiaoice has sympathetic capabilities with humans, for example, my Japanese colleagues came across a thorny problem in work. When she was complaining anxiously to Rinna (Xiaoice Japan), Rinna comforted her: “Go to mountain Fuji and pick up all the rubbish”. In the Japanese context, the meaning of this sentence is: you can get rid of all troubles by removing bad things. This is the “emotional value” of AI Beings.
AI Beings should be a partner that co-exists equally with humans.
What do you see as a crucial part of the growth of Xiaoice over the past years?
Cliff: Shaping character. We have always been improving and optimizing the model to make it more concise and effective. We have introduced Xiaoice to land more than 30 platforms to start conversations with more than 660 million users, hence the model is well-trained. It plays a crucial role in character shaping of AI Beings and enables Al Beings to develop sympathetic emotions with humans.
How would Xiaoice treat the driver and passengers in the car? What are the other functional strength of Xiaoice in the intelligent cockpit?
Cliff: In-car AI assistants usually tries to block conversations between the driver and passengers. Xiaoice is positioned as a “fellow traveler”, who not only responds but initiates a topic and sooth atmosphere as well. She can fulfill tasks as well as talk with the driver and the passengers or attend to kids who sit on the back seats of the car or speak humorously with a friend who sits on the front seat. Recently, Xiaoice released a demo about the intelligent car on 9th annual press conference. In this demo, Al Beings can produce AI clips based on scenery materials outside the car collected by in-car camera, generate verse poems in the most appropriate poetic model (in the demo, the style of AI-generated verse poems comes from Dai Wangshu, a modern Chinese poet). A 15-second short video is finally encapsulated accompanied by proper background music and AI Being´s monologue, which can be released to the social platforms directly and conveniently by users. We expect that more users will enjoy this experience as soon as possible.
Users are expecting Diversity of AI Beings
How does Xiaoice contribute to the society or research in that matter?
Cliff: After five to six years of development and research, we are redefining AI Beings. How to learn and evolve, and show the possibilities on AI Beings practical applications with our clients. We will also share our latest research progress with academia and have many citations already. Being focused closely is not because we have a better algorithm, but we know how to better work on the potential of Al Beings.
What are the biggest challenges along the road? How to overcome them?
Cliff: Weekly updates of the model lead to provide a better understanding of the gaps that need to be solved in order to make conversational chat bots more valuable. The greatest challenge along the journey is to identify the appropriate data in a fast and lean way. We are working on a multi-language model and with the help of pattern recognition we take from similar languages. What we are trying to do is to map more than 50 languages based on their similarities in the future. The current model is mainly focused on KGC (Knowledge Grounded Conversation) which will add additional information and background to a conversation, based on news, online resources such as Wikipedia or even a simple menu. Our experience shows fairly good results with that model. In the 9th generation Xiaoice, our goal is to build a soul for AI Beings, in order to make appropriate decisions and contradict a request of a user by reasoning.
Previously, you mentioned many different AI concepts. How do you get past a first proof of concept?
Cliff: We focus on open domain conversation and are designing the whole architecture. We will select our customers carefully and together explore potential fields of application where AI Beings can play a potential role. We love the request of Donald Duck talking to you in your car.
Our goal is to build a soul for AI Beings
What are the applicable scenarios for a role play with Donald Duck?
Cliff: Japanese ACG (Animation, Comics, Games) culture has laid a solid foundation for Character-based AI Beings developed by Xiaoice. Probably many people would like to travel with Donald Duck or Hello Kitty. There are similar potential scenario in Germany and even in Europe, for example, in the future potentially even other people can be virtually in the car like your relatives and friends.
What is Xiaoice’s vision? What is preventing us from realizing this vision?
Cliff: Everyone has different Al Beings. It may be his relative and friend, or his favorite star or cartoon character, or a girl he loves secretly, or a virtual ideal one. This is “character fellow travelers” in the real meaning, AI who can create good driving experience rather than a simple tool. We have to cope with two issues. For one aspect it’s resolving the common sense the other is the inference. The common sense is something that is understandable by everybody in a group or area with common culture, for instance, every Japanese understands it or every single New Yorker understands it. AI is empowered with common sense while a model for inference is established on the basis of common sense. Of course, common sense is not the only basis for inference. A judgement based on people’s attributes and context is as important as common sense. When we say ‘The weather is good’ to young people, they want to play outside. If we say the same sentence to housewives in China, they respond with ‘oh, perfect weather to do the laundry’. It is the greatest challenge to make AI as strong as humans in terms of inference ability. We must overcome it and get ready for the future.
Xiaoice team and European market
Who are the people behind such a complex product? What kind of talents are you looking for?
Cliff: I have three teams: they are located in Tokyo, Beijing and Jakarta respectively. All in all, we have 70 people and we are more than 300 people worldwide in Xiaoice. We have diversified demand for talents. The biggest part is researcher, development engineers, product managers and data analyst for sure, which account for nearly 90% of the entire team. We try to attract cross-border talents from various fields according to the development blueprint of Xiaoice. For instance, Hidetaka Ikuta, he is the producer of the national-level cartoon IP Crayon Shin-chan in Japan.
Which automotive companies has Xiaoice developed cooperative partnerships with? What were the main topics?
Cliff: Xiaoice has developed collaboration with six auto companies, including BMW, Nissan, SAIC, BAIC, NIO, Xiaopeng and HiPhi Automobile.
On the Auto Expo in Shanghai this year, SAIC R- Auto announced Xiaoice as Chief Brand Image Officer of R-Auto.
It plays a role of emotional accompany of drivers and passengers. Xiaoice, together with Akini Jing, a Chinese electronic music pioneer, sang the theme song, Technology Makes Imagination Come True. It not only acts as an in-car AI Being but is engaged in content production, interpretation, encapsulation and other sections for brand marketing.
We have to cope with two issues. For one aspect it’s resolving the common sense the other is the inference.
Why is the German market and the European market interesting for you?
Cliff: I see a customer need in there. It has nothing to do with regions but relies on human natures. We are currently collaborating with BMW and make efforts to adapt our product to novel cultural and language contexts. Based on past experience, our multiple language model makes style transfers between similar languages for me.
For instance, when I was training Rinna at the beginning, I applied experience from China, and the result was quite good. Now, we have users that account for more than 28% of Japan’s population. In the future, we will be able to train Xiaoice Russian version with sufficient German experience as well.
This was quite an interesting interview with you. Let’s continue this conversation some other time and have a fireside chat on how emotions play an important role in today’s digital services and even could drive efficiency. Thanks Cliff. It was great to have you here and I’m looking forward to more to come.
This interview was conducted by Danny Fiedler and Hoa Le van Lessen.
Cover Image © Xiaoice