Publications-Proceedings

Article View/Open

Publication Export

Google ScholarTM

NCCU Library

Citation Infomation

Related Publications in TAIR

題名 Visual Story Ordering with a Bidirectional Writer
作者 黃瀚萱
Huang, Hen-Hsen
Lin, Wei-Rou
Chen, Hsin-Hsi
貢獻者 資科系
關鍵詞 Multimodal modeling ; temporal information ordering ; sentence ordering ; visual-semantic representation
日期 2020-06
上傳時間 4-Jun-2021 14:45:27 (UTC+8)
摘要 This paper introduces visual story ordering, a challenging task in which images and text are ordered in a visual story jointly. We propose a neural network model based on the reader-processor-writer architecture with a self-attention mechanism. A novel bidirectional decoder is further proposed with bidirectional beam search. Experimental results show the effectiveness of the approach. The information gained from multimodal learning is presented and discussed. We also find that the proposed embedding narrows the distance between images and their corresponding story sentences, even though we do not align the two modalities explicitly. As it addresses a general issue in generative models, the proposed bidirectional inference mechanism applies to a variety of applications.
關聯 Proceedings of the 2020 International Conference on Multimedia Retrieval (ICMR ’20), Association for Computing Machinery, pp.326-330
資料類型 conference
DOI https://doi.org/10.1145/3372278.3390735
dc.contributor 資科系
dc.creator (作者) 黃瀚萱
dc.creator (作者) Huang, Hen-Hsen
dc.creator (作者) Lin, Wei-Rou
dc.creator (作者) Chen, Hsin-Hsi
dc.date (日期) 2020-06
dc.date.accessioned 4-Jun-2021 14:45:27 (UTC+8)-
dc.date.available 4-Jun-2021 14:45:27 (UTC+8)-
dc.date.issued (上傳時間) 4-Jun-2021 14:45:27 (UTC+8)-
dc.identifier.uri (URI) http://nccur.lib.nccu.edu.tw/handle/140.119/135531-
dc.description.abstract (摘要) This paper introduces visual story ordering, a challenging task in which images and text are ordered in a visual story jointly. We propose a neural network model based on the reader-processor-writer architecture with a self-attention mechanism. A novel bidirectional decoder is further proposed with bidirectional beam search. Experimental results show the effectiveness of the approach. The information gained from multimodal learning is presented and discussed. We also find that the proposed embedding narrows the distance between images and their corresponding story sentences, even though we do not align the two modalities explicitly. As it addresses a general issue in generative models, the proposed bidirectional inference mechanism applies to a variety of applications.
dc.format.extent 1744655 bytes-
dc.format.mimetype application/pdf-
dc.relation (關聯) Proceedings of the 2020 International Conference on Multimedia Retrieval (ICMR ’20), Association for Computing Machinery, pp.326-330
dc.subject (關鍵詞) Multimodal modeling ; temporal information ordering ; sentence ordering ; visual-semantic representation
dc.title (題名) Visual Story Ordering with a Bidirectional Writer
dc.type (資料類型) conference
dc.identifier.doi (DOI) 10.1145/3372278.3390735
dc.doi.uri (DOI) https://doi.org/10.1145/3372278.3390735