The multimodal dialogue understanding and response prediction task can be divided into two phases: multimodal context understanding and response prediction. Specifically, the former includes dialogue ...