Skip to main content

Challenges in connecting language and vision for multimodal dialogue systems

Time: Fri 2021-11-19 15.15

Location: Fantum and Zoom

Lecturer: Bram Willemsen

Abstract:
To have a conversation that involves references to objects and entities
in a shared environment, be it simulated or physical, requires not only
the ability to produce referring expressions but also the ability to
comprehend them. Understanding whether or not an utterance contains
referring language is a start, but what tends to be essential for
effective communication is knowing exactly which words refer to what things.

In this seminar, I will talk about language grounding, and more
specifically the challenges involved in producing and understanding
grounded language for conversational systems. I will discuss in more
detail some of the problems we have come to address over the last two
years and the progress made thus far, including our use of large,
pre-trained multimodal embedding models for downstream tasks, and
difficulties faced in the process of collecting visually-grounded
dialogue data via crowdsourcing.


Other ways to join virtually:
Meeting ID: 637 9043 5562

One tap mobile
+46850520017,,63790435562# Sweden
+46850539728,,63790435562# Sweden

Dial by your location
  +46 8 5052 0017 Sweden
  +46 850 539 728 Sweden
  +46 8 4468 2488 Sweden
  +46 8 5016 3827 Sweden
  +46 8 5050 0828 Sweden
  +46 8 5050 0829 Sweden
Meeting ID: 637 9043 5562
Find your local number: https://kth-se.zoom.us/zoomconference

Join by SIP
63790435562@zoom.nordu.net

 From a KTH Cisco Video System you can just input 63790435562 and then
the CALL-button

Join by H.323
109.105.112.236
109.105.112.235

Belongs to: Speech, Music and Hearing
Last changed: Nov 15, 2021