Program

Note: All sessions are in the UTC+8 time zone.
The password of all zoom meeting rooms is: mmasia2023
Dec. 6
08:30 ~ 09:00 Opening
09:00 ~ 10:00 Keynote 1
Prof. Kyoung Mu Lee

Zoom meeting room 1
10:00 ~ 10:30 Coffee break
10:30 ~ 12:00 Oral 1
Special Session of Multimedia on Cooking and Eating Activities
Zoom meeting room 1
Oral 2
Image/Video Processing and Synthesis
Zoom meeting room 2
Oral 3
Multimedia Signal Processing and Analysis
Zoom meeting room 3
12:00 ~ 14:00 Lunch
14:00 ~ 15:30 Oral 4
Object Detection and Pose Estimation
Zoom meeting room 1
Oral 5
Image Classification and Object Detection
Zoom meeting room 2
Oral 6
Image, Video, and Point Cloud Segmentation
Zoom meeting room 3
15:30 ~ 16:00 Coffee break Poster 1
& DEMO
& 運動科技展示會
Gather town
16:00 ~ 17:30 Oral 7
Multimedia Retrival
Zoom meeting room 1
Oral 8
Multi-modal Data Analysis
Zoom meeting room 2
Oral 9
Image/Video Communication
Zoom meeting room 3
18:00 ~ Welcome Reception
Dec. 7
09:00 ~ 10:00 Keynote 2
Prof. Chang Wen Chen

Zoom meeting room 1
10:00 ~ 10:30 Coffee break
10:30 ~ 12:00 Oral 10
Best Paper Session

Zoom meeting room 1
12:00 ~ 14:00 Lunch
14:00 ~ 15:30 Oral 11
Tracking and Re-identification
Zoom meeting room 1
Oral 12
Human Pose and Motion Analysis
Zoom meeting room 2
Grand Challenge
Zoom meeting room 3
15:30 ~ 16:00 Coffee break Poster 2
& DEMO
Gather town
16:00 ~ 17:30 Oral 13
Action and Scenes
Zoom meeting room 1
Oral 14
Protection and Communications
Zoom meeting room 2
Oral 15
Multimedia Applications
Zoom meeting room 3
18:00 ~ Banquet
Dec. 8
09:00 ~ 10:00 Workshop 2

Workshop on Intelligent Sports Technologies (WIST)
Zoom meeting room 2
Workshop 3

Workshop on Low-quality Visual Data for Computer Vision and Media
Zoom meeting room 3
10:00 ~ 10:30 Coffee break
10:30 ~ 12:00 Workshop 2

Workshop on Intelligent Sports Technologies (WIST)
Zoom meeting room 2
Workshop 3

Workshop on Low-quality Visual Data for Computer Vision and Media
Zoom meeting room 3
12:00 ~ 14:00 Lunch
14:00 ~ 15:30 Tutorial 1
Geometric deep learning and its applications for Multimedia
Zoom meeting room 1
Tutorial 2
Streaming Media: Algorithms, Protocols and Systems
Zoom meeting room 2
Workshop 1

M3Oriental: Workshop of Multimodal, Multilingual and Multitask Modeling Technologies for Oriental Languages
Zoom meeting room 3
Local Tour
15:30 ~ 16:00 Coffee break
16:00 ~ 17:30 Tutorial 1
Geometric deep learning and its applications for Multimedia
Zoom meeting room 1
Tutorial 2
Streaming Media: Algorithms, Protocols and Systems
Zoom meeting room 2
Workshop 1

M3Oriental: Workshop of Multimodal, Multilingual and Multitask Modeling Technologies for Oriental Languages
Zoom meeting room 3
Local Tour


December 6
08:30 ~ 09:00 Opening 09:00 ~ 10:00 Keynote 1: Prof. Kyoung Mu Lee Zoom meeting room 1 (Chair: Prof. Wen-Huang Cheng (National Taiwan University)) 10:00 ~ 10:30 Coffee break 10:30 ~ 12:00 Oral 1 Special Session of Multimedia on Cooking and Eating Activities Zoom meeting room 1 (Session Chair: Prof. Ichiro Ide (Nagoya University))
107 Cross-modal Image-Recipe Retrieval via Multimodal Fusion
Lijie Li (Harbin Engineering University); Caiyue Hu (Harbin Engineering University); zhang haitao (Harbin Engineering University)*; Akshita Maradapu Vera Venkata Sai (Towson University)
192 Learning a Contextualized Multimodal Embedding for Zero-shot Cooking Video Caption Generation
Lin Wang (USTC)*; Hongyi Zhang (Huawei); Xing-Fu Wang (USTC); yan xiong (ustc)
208 Mask-based Food Image Synthesis with Cross-Modal Recipe Embeddings
chen zhongtao (The University of Electro-Communications); Yuma Honbu (The University of Electro-Communications, Tokyo); Keiji Yanai (The University of Electro-Communications, Tokyo)*
245 RecipeMeta: Metapath-enhanced Recipe Recommendation on Heterogeneous Recipe Network
Jialiang Shi (Nagoya University)*; Takahiro Komamizu (Nagoya University); Keisuke Doman (Chukyo University); Haruya Kyutoku (Aichi University of Technology); Ichiro Ide (Nagoya University)
313 Open-Vocabulary Segmentation Approach for Transformer-Based Food Nutrient Estimation
Satayu Parinayok (The University of Tokyo)*; Yoko Yamakata (University of Tokyo, Japan); Kiyoharu Aizawa (The University of Tokyo)
10:30 ~ 12:00 Oral 2 Image/Video Processing and Synthesis Zoom meeting room 2 (Session Chair: Prof. Yu-Shuen Wang (National Yang Ming Chiao Tung University))
60 Guided Spatio-Temporal Learning Method for 4K Video Super-Resolution
Jie Liu (National University of Defense Technology)*; Qin Jiang (National University of Denfense Technology); Qinglin Wang (National University of Defense Technology )
159 DiffuseGAE: Controllable and High-fidelity Image Manipulation from Disentangled Representation
Yipeng Leng (national innovation institute of defense technology)*; Qiangjuan Huang (Defense Innovation Institute); Zhiyuan Wang (AIRC); Yangyang Liu (southeast university); Haoyu Zhang (Artificial Intelligence Research Center, Defense Innovation Institute)
201 From Global to Local: An Adaptive Environmental Illumination Estimation for Non-uniform Scattering
Huaizhuo Liu (Beihang University)*; Hai-Miao Hu (Beihang Univeristy)
212 AniCropify: image matting for anime-style illustration
Yuki Matsuura (Kansai University); Takahiro Hayashi (Kansai University)*
278 Dual-domain Feature Learning and Cross Dimension Interaction Attention for Nighttime Image Dehazing
Yun Liang (South China Agricultural University)*; Shijie Peng (South China Agricultural University); Xinjie Xiao (South China Agricultural University); Lianghui Li (South China Agricultural University)
10:30 ~ 12:00 Oral 3 Multimedia Signal Processing and Analysis Zoom meeting room 3 (Session Chair: Prof. Tse-Yu Pan (National Taiwan University of Science and Technology))
160 A Lightweight and Efficient Model for Audio Anti-Spoofing
Qiaowei Ma (South China University of Technology)*; Jinghui Zhong (South China University of Technology); Yitao Yang (South China University of Technology); Weiheng Liu (South China University of Technology); Ying Gao (south china university of technology); Wing Ng (South China University of Technology)
169 Speech Spoofing Detection Based on Graph Attention Networks with Spectral and Temporal Information
Peng Zhang (Shandong Computer Science Center (National Supercomputer Center in Jinan))*; Yida Chen (Qilu University of Technology (Shandong Academy of Sciences)); Meijuan Li (Qilu University of Technology (Shandong Academy of Sciences)); Hui Zhao (Qilu University of Technology (Shandong Academy of Sciences)); Jianqiang Zhang (Shandong Computer Science Center (National Supercomputer Center in Jinan)); Fuqiang Wang (Shandong Computer Science Center (National Supercomputer Center in Jinan)); Xiaoming Wu (Shandong Computer Science Center)
247 FTUnet: Feature Transferred U-Net For Single HDR Image Reconstruction
Shifeng XIE (Xidian university); Yi Liu (Xidian University)*; wenjing shuai (xidian university)
280 Improve Singing Quality Prediction Using Self-supervised Transfer Learning and Human Perception Feedback
Ping-Chen Chan (National Tsing Hua University); Po-Wei Chen (National Tsing Hua University)*; Von-Wun Soo (nthu)
318 Independent and Collaborative Demosaicking Neural Networks
Yan Niu (Jilin University)*; Lixue Zhang (Jilin University); Chenlai Li (Shenzhen Polytechnic University)
12:00 ~ 14:00 Lunch 14:00 ~ 15:30 Oral 4 Object Detection and Pose Estimation Zoom meeting room 1 (Session Chair: Prof. Jun-Cheng Chen (Academia Sinica))
39 History-Detr: Optimize Query Initialization Strategy by Using Historical Information and Kinematics
WeiJie Luo (Shanghai Jiao Tong University)*; ZiHao Liu (Shanghai Jiao Tong University); Guohao Dai (Shanghai Jiao Tong University); Ningyi Xu (Shanghai Jiao Tong University)
50 A Multi-scale and Dense Object Detector for Tibetan Thangka Images
Gaohuan Dong (Wuhan University of Technology); Qing Xie (Wuhan University of Technology); Jiachen Li (Wuhan University of Technology); Yanchun Ma (Wuhan University of Technology)*; Yuhan Liu (Huazhong University of Science and Technology); Yongjian Liu (Wuhan University of Technology)
75 Semantic-Aware Dynamic Feature Selection and Fusion for Object Detection in UAV Videos
Jianping Zhong (Harbin Institute of Technology)*; Zhaobo Qi (Harbin Institute of Technology); Weigang Zhang (Harbin Institute of Technology, Weihai); Qingming Huang (University of Chinese Academy of Sciences)
133 Domain-Adaptive Mean Teacher for Category-Level Object Pose Estimation
I-Ju Hsieh (National Taiwan University); Yo-Chung Lau (National Taiwan University)*; Peng-Yuan Kao (National Taiwan University); Shih-Ping Hung (National Taiwan University); Yi-Ping Hung (National Taiwan University)
173 A Decoupled Cross-layer Fusion Network with Bidirectional Guidance for Detecting Small Logos
Songhui Zhao (Shandong Normal University); Sujuan Hou (Shandong Normal University)*; Baisong Zhang (Yunnan University)
14:00 ~ 15:30 Oral 5 Image Classification and Object Detection Zoom meeting room 2 (Session Chair: Prof. Hong-Han Shuai (National Yang Ming Chiao Tung University))
114 Class-aware Convolution and Attentive Aggregation for Image Classification
Zitan Chen (Shandong University); Zhuang Qi (Shandong University); Xiangxian Li (Shandong University); Yuqing Wang (Shandong University); Lei Meng (Shandong University)*; Xiangxu Meng (Shandong University)
136 Feature Adaptation with CLIP for Few-shot Classification
Guangxing Wu (Sun Yat-sen University); Junxi Chen (Sun Yat-sen University); Wentao Zhang (Sun Yat-sen University); Ruixuan Wang (Sun Yat-sen University)*
254 Towards Representation Alignment and Uniformity in Long-tailed Classification
Yi Zheng (Guangxi University)*; Zuqiang Meng (Guangxi University)
272 ADNet: An Asymmetric Dual-Stream Network for RGB-T Salient Object Detection
Yaqun Fang (Nanjing University); Ruichao Hou (Nanjing University); Jia Bei (Nanjing University)*; Tongwei Ren (Nanjing University); Gangshan Wu (Nanjing University)
324 Monocular 3D Pose Estimation of Very Small Airplane in the Air
Sung Kwon On (Republic of Korea Air Force Academy); Songhyon Kim (Republic of Korea Air Force Academy); Kwangjin Yang (Republic of Korea Air Force Academy); Younggun Lee (Republic of Korea Air Force Academy)*
14:00 ~ 15:30 Oral 6 Image, Video, and Point Cloud Segmentation Zoom meeting room 3 (Session Chair: Prof. Min-Chun Hu (National Tsing Hua University))
125 NuclSeg: nuclei segmentation using semi-supervised stain deconvolution
Haixin Wang (Hosei University); Jian Yang (Hosei University); Ryohei Katayama (Cancer Chemotherapy Center at Japanese Foundation for Cancer Research); Michiya Matsusaki (Department of Applied Chemistry, Osaka University); Tomoyuki Miyao (Data Science Center, Nara Institute of Science and Technology); Jinjia Zhou (Hosei University)*
166 Reimagining 3D Visual Grounding: Instance Segmentation and Transformers for Fragmented Point Cloud Scenarios
Zehan Tan (Fudan University)*; Weidong Yang (Fudan University); zhiwei wang (GREE ELECTRIC APPLIANCES,INC.OF ZHUHAI)
257 SFNet: Saliency fast Fourier convolutional Network for medical image segmentation
Shangwang LIU (Henan Normal University)*; Danyang LIU (Henan Normal University); Yinghai Lin (Henan Normal University); Ziqi Wei (Henan Normal University)
282 SASSM: Semantic Awareness and Self-Support Matching for Semi-Supervised Video Object Segmentation
Yun Liang (South China Agricultural University)*; Ming Junhui (South China Agricultural University); Jintu Zheng (Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences)
293 Multi-Scale Superpoint Network for 3D Point Cloud Semantic Segmentation
FT Zheng (Nanjing University of Science and Technology)*; Le Hui (Northwestern Polytechnical University); Jin Xie (Nanjing University of Science and Technology); Haofeng Zhang (Nanjing University of Science and Technology)
15:30 ~ 16:00 Coffee break 15:30 ~ 17:30 Poster 1 & DEMO & 運動科技展示會 Gather town (Session Chair: Prof. Chih-Chung Hsu (National Cheng Kung University))
177 Robust Tracking via Unifying Pretrain-Finetuning and Visual Prompt Tuning
Guangtong Zhang (Guangxi Normal University); Qihua Liang (Guangxi Normal University)*; Ning Li (Guangxi Normal University); Zhiyi Mo (Guangxi Normal University; Wuzhou University); Bineng Zhong (Guangxi Normal University)
237 Hierarchical Multi-Scale Adaptive Conv-LSTM Network for Human Action Recognition Based on Wearable Sensors
Weiliang Xie (Hohai University)*; Yanfang Wang (Hohai University); Chang Li (Hohai University); Yanwei Liu (Institute of Information Engineering, Chinese Academy of Sciences); Qian Huang (Hohai University)
267 Geometric Style Transfer for Face Portraits
MIAOMIAO DAI (sjtu)*; Hao Yin (Shanghai Jiaotong University); Ran Yi (Shanghai Jiao Tong University); Lizhuang Ma (Shanghai Jiao Tong University)
273 RGB-D Tracking via Hierarchical Modality Aggregation and Distribution Network
Boyue Xu (Nanjing University); Yi Xu (Nanjing University); Ruichao Hou (Nanjing University); Jia Bei (Nanjing University)*; Tongwei Ren (Nanjing University); Gangshan Wu (Nanjing University)
288 MontageNet: Annotated Dataset of Furniture Components in Real-World Images
Iuan Kai Fang (National Tsing Hua University)*; Bo Hao Zhang (National Tsing Hua University); Te Lun Liu (National Tsing Hua University); Hao Tan (National Tsing Hua University); Wei Syun Chen (National Tsing Hua University); Che-Rung Lee (National Tsing Hua University )
260 Facial Parameter Splicing: A Novel Approach to Efficient Talking Face Generation
Xianhao Chen (Beijing University of Posts and Telecommunications)*; Kuan Chen (Beijing University of Posts and Telecommunications); Yuzhe Mao (Beijing University of Posts and Telecommunications); Linna Zhou (BUPT); Weike You (BUPT)
262 Few-Shot Learning for Word Recognition in Handwritten Seventeenth-Century Spanish American Notary Records
Nouf Alrasheed (University of Missouri-Kansas City); Shraboni Sarker (University of Missouri-Columbia); Viviana Grieco (University of Missouri-Kansas City); Praveen Rao (University of Missouri-Columbia)*
290 Towards Digital Twin of Crops for Growth Modelling using Virtual Reality
Karanvir Singh (Indian Institute of Technology Ropar)*; Mukesh Saini (IIT Ropar)
294 Exploring User-oriented Social Recommendation System through Granting Users Control over a Social Group
Sangwon Lee (Korea University)*; Jeonguk Hong (Korea University); Gyewon Jeon (Korea University)
104 Music-Graph2Vec: An Efficient Method for Embedding Pitch Segment
Taiwei Wu (SZTU)*; Jianhao Zhang (SZTU); Lian Duan (SZTU); Yuanzhe Cai (Shenzhen Technique University)
127 Adaptive Sampling for Computer Vision-Oriented Compressive Sensing
Luyang Liu (Osaka University)*; Hiroki Nishikawa (Osaka University); Jinjia Zhou (Hosei University); Ittetsu Taniguchi (Osaka University); Takao Onoye (Osaka University)
130 EmAGAN: Embedded Blocks Search and Mask Attention GAN for Makeup Transfer
Li Yan (Henan Normal University)*; Wang Shibin (Henan Normal University)
180 FinGuard: A Multimodal AIGC Guardrail in Financial Scenarios
wenlong du (ant group); Qingquan Li (Ant Group)*; Jian Zhou (Ant Group); Xu Ding (Ant Group); Xuewei Wang (Ant Group); Zhongjun Zhou (Ant Group); Jin Liu (Ant Group)
274 Easy Travelogue: A Travelogue Editor with Automatic Image Recommendation and Insertion
Fan Yu (Nanjing University); Huanyu Xing (Nanjing University); Jia Bei (Nanjing University)*; Tongwei Ren (Nanjing University)
132 Developing a VR-based contextualized language learning system to Enhance Junior High School Students’ Pragmatic Competence
Kuo-Yu Liu (Providence University)*; Yuanshan Chen (National Chin-Yi University of Technology; Ming-Fang Lin (Shih Chien University); Li-Jung Daphne Huang (Providence University); Cheah Ping Xiang (Providence University)
146 Contextual Associated Triplet Queries for Panoptic Scene Graph Generation
JINGBIN XU (The University of Electro-Communications)*; Junwen Chen (The University of Electro-Communications); Keiji Yanai (The University of Electro-Communications, Tokyo)

運動科技展示會:
145 OmniScorer: Real-Time Shot Spot Analysis for Court View Basketball Videos
Yen-Pin Cheng (National Tsing Hua University)*; Tsung-Hsun Tsai (National Tsing Hua University); Tai-Chen Tsai (National Tsing Hua University); Yi-Hsuan Chiu (National Tsing Hua University); Hung-Kuo Chu (National Tsing Hua University); Min-Chun Hu (National Tsing Hua University)
326 A Trajectory-based Statistics and Tactics Analysis System for Table Tennis
Guan-Yu Wu (National Cheng Kung University); Chun-Ho Hung (National Cheng Kung University); Hsuan-Wei Chen (National Cheng Kung University); Wei-Ta Chu (National Cheng Kung University)*
AI Batting Buddy: A Computational and Kinematic Approach for Enhancing Batting Performance and Analysis in Baseball
Kuo-Yu Liu (Providence University), Ting-Yu Guo (Providence University), Ta-Shan Pan (Providence University), Ping-Yi Tung (Providence University), Yi-Rou Lin (Providence University), Shin-Jing Li (Providence University)
DepBoxia: Depth Perception Training in Boxing, an Immersive Approach
Hung-Kuo Chu (National Tsing Hua University)
16:00 ~ 17:30 Oral 7 Multimedia Retrival Zoom meeting room 1 (Session Chair: Prof. Yoko Yamakata (The University of Tokyo))
121 Relevance and Irrelevance Considered Subspace Mapping Neural Networks for Remote Sensing Text-Image Retrieval
Xiu Li (Ocean University of China)*; chengyu zheng (Ocean University of China); Jie Nie (Ocean University of China); Ruoyu Zhang (Ocean University of China); Xinyue Liang (Ocean University of China); Zhiqiang Wei (Ocean University of China)
128 Adapting Hierarchical Transformer for Scene-Level Sketch-Based Image Retrieval
Jie Yang (Wuhan University); aihua ke (wuhan university); Bo Cai (Wuhan University)*
143 Cross-modal Consistency Learning with Fine-grained Fusion Network for Multimodal Fake News Detection
Jun Li (University of Electronic Science and Technology of China); Yi Bin (National University of Singappore); jie zou (University of Electronic Science and Technology of China); Jiwei Wei (University of Electronic Science and Technology of China); Guoqing Wang (University of Electronic Science and Technology of China); Yang Yang (University of Electronic Science and Technology of China)*
222 Targeted Transferable Attack against Deep Hashing Retrieval
Fei Zhu (Institute of Information Engineering,Chinese Academy of Sciences)*; Wanqian Zhang (Institute of Information Engineering, Chinese Academy of Sciences); Dayan Wu (Institute of Information Engineering, Chinese Academy of Sciences); Lin Wang (Institute of Information Engineering, Chinese Academy of Sciences); Bo Li ( Institute of Information Engineering, Chinese Academy of Sciences); Weiping Wang (Institute of Information Engineering, CAS, China)
233 Multi-view–enhanced modal fusion hashing for Unsupervised cross-modal retrieval
Longfei Ma (Chongqing Normal University); honggang zhao (Chongqing Normal University); Zheng Jiang (ChongQing Normal University); Mingyong Li (Chongqing Normal University)*
16:00 ~ 17:30 Oral 8 Multi-modal Data Analysis Zoom meeting room 2 (Session Chair: Prof. Ryosuke Yamanashi(Kansai University))
51 A Cross-modal and Redundancy-reduced Network for Weakly-Supervised Audio-Visual Violence Detection
yidan fan (Tianjin university)*; Yongxin Yu (Tianjin University); Wenhuan Lu (Tianjin University); Yahong Han (Tianjin University)
52 From Pixels to Explanations: Uncovering the Reasoning Process in Visual Question Answering
Siqi Zhang (Tongji University); Jing Liu (National Lab of Pattern Recognition, Institute of Automation,Chinese Academy of Sciences); Zhihua Wei (Tongji University)*
72 Exploring Feature Fusion from A Contrastive Multi-Modality Learner for Liver Cancer Diagnosis
Yang Fan Chiang (National Cheng Kung University); Pei-Xuan Li (National Cheng Kung University); Ding-You Wu (National Cheng Kung University); Hsun-Ping Hsieh (National Cheng Kung University)*; Ching-Chung Ko (Chi-Mei Medical Center)
150 I2SRM: Intra- and Inter-Sample Relationship Modeling for Multimodal Information Extraction
Yusheng Huang (Shanghai Jiao Tong University)*; Zhouhan Lin (Shanghai Jiao Tong University)
181 Efficient Hand Gesture Recognition using Multi-Task Multi-Modal Learning and Self-Distillation
Jie-Ying Li (National Tsing Hua University)*; Herman Prawiro (National Tsing Hua University); Chia-Chen Chiang (National Tsing Hua University); Hsin-Yu Chang (National Tsing Hua University); Tse-Yu Pan (National Taiwan University of Science and Technology); Chih-Tsun Huang (National Tsing Hua University); Min-Chun Hu (National Tsing Hua University)
16:00 ~ 17:30 Oral 9 Image/Video Communication Zoom meeting room 3 (Session Chair: Prof. Phoebe Chen (La Trobe University))
33 Lambda-Domain Rate Control for Neural Image Compression
Naifu Xue (Communication University of China)*; Yuan Zhang (Communication University of China)
126 Block based Adaptive Compressive Sensing with Sampling Rate Control
Kosuke Iwama (Hosei University); Ryugo Morita (Hosei University); Jinjia Zhou (Hosei University)*
240 Power Efficient Mobile VTuber Live Streaming
Zichen Zhu (Rutgers University); Stefano Petrangeli (Adobe); Viswanathan (Vishy) Swaminathan (Adobe); Sheng Wei (Rutgers University)*
271 Optical Flow based Feature Prediction and Decomposed Context for Video Compression
huashan sun (Hohai University); Qian Huang (Hohai University)*; Yiming Wang (Hohai University); Xiaotong Guo (Hohai University); Ruoyu Hao (Hohai University)
281 End-to-End Variable-Rate Image Compression with Bi-Resolution Spatial-Channel Context Aggregation
Xiaotong Guo (Hohai University); Qian Huang (Hohai University)*; Yiming Wang (Hohai University); huashan sun (Hohai University)
18:00 ~ Welcome Reception

December 7
09:00 ~ 10:00 Keynote 2: Prof. Chang Wen Chen Zoom meeting room 1 (Chair: Prof. Wei-Ta Chu (National Cheng Kung University)) 10:00 ~ 10:30 Coffee break 10:30 ~ 12:00 Oral 10 Best Paper Session Zoom meeting room 1 (Session Chair: Prof. Wei-Ta Chu (National Cheng Kung University))
54 Global-Local GraphFormer: Towards Better Understanding of User Intentions in Sequential Recommendation
Hong Chen (Tsinghua University)*; Bin Huang (Tsinghua University); Xin Wang (Tsinghua University); Yuwei Zhou (Tsinghua University); Wenwu Zhu (Tsinghua University)
152 Cross-Modal Retrieval for Motion and Text via DropTriple Loss
Sheng Yan (Chongqing University of Technology); Yang Liu (Chongqing University of Technology); haoqiang H wang (重庆理工大学人工智能学院); Xin Du (Chongqing University of Technology); Mengyuan Liu (Peking University, Shenzhen Graduate School)*; Hong Liu (Peking University Shenzhen Graduate School)
228 Feature Enhancement and Foreground-Background Separation for Weakly Supervised Temporal Action Localization
Peng Liu (Qingdao University of Science and Technology); Chuanxu Wang (Qingdao University of Science and Technology)*; Jianwei Qin (Qingdao University of Science and Technology); Guocheng Lin (Qingdao University of Science and Technology)
258 Multi-head Siamese Prototype Learning against both Data and Label Corruption
Peng-Fei Zhang (University of Queensland)*; Zi Helen Huang (University of Queensland)
12:00 ~ 14:00 Lunch 14:00 ~ 15:30 Oral 11 Tracking and Re-identification Zoom meeting room 1 (Session Chair: Prof. James Jenn-Jier Lien (National Cheng Kung University))
30 TrackNetV3: Enhancing ShuttleCock Tracking with Augmentations and Trajectory Rectification
Yu-Jou Chen (National Yang Ming Chiao Tung University); Yu-Shuen Wang (National Yang Ming Chiao Tung University)*
154 Occlusion-Aware Manga Character Re-identification with Self-Paced Contrastive Learning
Ci-Yin Zhang (National Cheng Kung University); Wei-Ta Chu (National Cheng Kung University)*
164 SOFTCUTMIX: Data Augmentation and Algorithmic Enhancements for Cross-Modality Person Re-Identification
Yuxiang Wan (Guangdong University of Technology)*; Banghai Wang (Guangdong University of Technology); Lunke Fei (Guangdong University of Technology)
204 Key Parts Spatio-Temporal Learning for Video Person Re-identification
Wei Guo (Institute of information engineering, Chinese Academy of Sciences)*; Hao Wang (Henan Normal University)
285 GTTrack: Gaussian Transformer Tracker for Visual Tracking
Yun Liang (South China Agricultural University)*; Fumian Long (South China Agricultural University); Qiaoqiao Li (South China Agricultural University); Dong Wang (South China Agricultural University)
14:00 ~ 15:30 Oral 12 Human Pose and Motion Analysis Zoom meeting room 2 (Session Chair: Prof. Mukesh Saini (Indian Institute of Technology Ropar))
71 Joint Coordinate Regression and Association For Multi-Person Pose Estimation, A Pure Neural Network Approach
Dongyang Yu (None)*; YUNSHI XIE (UIUC); Wangpeng An (Tiktok Inc.); Zhang Li (Beijing Forestry University); Yufeng Yao (BeiJing Forest University)
74 Learning Snippet-to-Motion Progression for Skeleton-based Human Motion Prediction
Xinshun Wang (Sun Yat-sen University); Qiongjie Cui (Nanjing University of Science and Technology); Chen Chen (University of Central Florida); Shen Zhao (Sun Yat-Sen University); Mengyuan Liu (Peking University, Shenzhen Graduate School)*
90 Graph-Guided MLP-Mixer for Skeleton-Based Human Motion Prediction
Xinshun Wang (Sun Yat-sen University); Qiongjie Cui (Nanjing University of Science and Technology); Chen Chen (University of Central Florida); Shen Zhao (Sun Yat-Sen University); Mengyuan Liu (Peking University, Shenzhen Graduate School)*
193 MA-Net: Multi-Attention Network for Skeleton-Based Action Recognition
Jingwen Cui (Hohai University); Qian Huang (Hohai University)*; Chang Li (Hohai University); Yunfei Zhang (Hohai University)
251 Research on Multi-Person Pose Estimation Based on YOLO and Decoupled Multi-Level Feature Layers Fusion
Bin Zheng (Changsha University of Science & Technology ); He Zhang (Changsha University of Science & Technology)*; Lu Jin (Changsha University of Science & Technology )
14:00 ~ 15:30 Grand Challenge Zoom meeting room 3 (Session Chair: Prof. Chia-Chi Tsai (National Cheng Kung University))
350 One-Epoch Training for Object Detection in Fisheye Images
Yu-Hsi Chen (National Yang Ming Chiao Tung University)*
351 Adapting Object Detection to Fisheye Cameras: A Knowledge Distillation with Semi-Pseudo-Label Approach
Chih-Chung Hsu (National Cheng Kung University)*; Wen-Hai Tseng (Institute of Data Science, National Cheng Kung University); Ming-Hsuan Wu (Institute of Data Science, National Cheng Kung University); Chia-Ming Lee (National Cheng Kung University); Wei-Hao Huang (National Cheng Kung University)
352 Object Detection via Fisheye Camera
Yi-Zheg Hsieh (NTUST); HAU-CHING CHEN (National Taiwan University of Science and Technology ); I_HUNG YEH (Taiwan University of Science and Technology)*
353 Summary of the 2023 PAIR-LITEON Competition: Embedded AI Object Detection Model Design Contest on Fish-eye Around-view Cameras
Yu-Shu NI ( National Yang Ming Chiao Tung University)*; Chia-Chi Tsai (National Cheng Kung University); Jyun-Syu Lin (National Yang Ming Chiao Tung University); Hsien-Po Meng (National Yang Ming Chiao Tung University); Po-Chi Hu (PAIR Labs); Jiun-Shiung Chen (LITEON Technology Corp.); Kun-Hung Lin (LITEON Technology Corp. ); Chih-Yuan Chuang (LITEON Technology Corp.); Jiun-In Guo (National Chiao Tung University)
15:30 ~ 16:00 Coffee break 15:30 ~ 17:30 Poster 2 & DEMO Gather town (Session Chair: Prof. Chih-Chung Hsu (National Cheng Kung University))
91 Self-supervised anomaly detection of medical images based on dual-module discrepancy
Yuqing Song (Qilu University of Technology(Shandong Academy of Sciences)); Jinyong Cheng (Qilu University of Technology(Shandong Academy of Sciences) )*
241 An Evaluation of Decentralized Group Formation Techniques for Flying Light Specks
Hamed Alimohammadzadeh (University of Southern California)*; Heather Culbertson (); Shahram Ghandeharizadeh (USC)
264 Generic Attention-model Explainability by Weighted Relevance Accumulation
Yiming Huang (Beijing University of Technology); Aozhe Jia (Beijing University of Technology); Xiaodan Zhang (Beijing University of Technology)*; Jiawei Zhang (Sensetime Research)
311 Confidence-guided Boundary Adaption Network for Multimodal Fake News Detection
Lin Jiajie (Guangdong University of Technology)*; Zhuopan Yang (Guangdong University of Technology); Zhenguo Yang (Guangdong University of Technology); Xiaoping 李 Li (Southeast University); Fu Lee Wang (Hong Kong Metropolitan University); Wenyin Liu (Guangdong University of Technology)
38 Learning Surface-awareness Network for X-Ray Prohibited Item Detection
Ying Shen (Southwest Jiaotong University); Wei Li (Southwest Jiaotong University)*; Zhaoquan Yuan (School of Computing and Artificial Intelligence, Southwest Jiaotong University); Xiao Wu (Southwest Jiaotong University)
131 An Efficient CNN-based Prediction for Reversible Data Hiding
Mingjin Wu (Jinan University); Shijun Xiang (暨南大学)*
184 Reducing Objective Difficulty Without Influencing Subjective Difficulty in a Video Game
Shunta Sakaue (Kochi University of Technology); Taiju Kimura (Kochi University of Technology); Hiroki Nishino (Kochi University of Technology)*
203 Multi-region CNN-Transformer for Micro-gesture Recognition in Face and Upper Body
Keita Suzuki (Nippon Telegraph and Telephone Corporation)*; Satoshi Suzuki (Nippon Telegraph and Telephone Corporation); Ryo Masumura (Nippon Telegraph and Telephone Corporation); Atsushi Ando (Nippon Telegraph and Telephone Corporation); Naoki Makishima (Nippon Telegraph and Telephone Corporation)
206 VQ-VDM: Video Diffusion Models with 3D VQGAN
Ryota Kaji (The University of Electro-Communications); Keiji Yanai (The University of Electro-Communications, Tokyo)*
261 Automatic Dataset Creation from User-generated Recipes for Ingredient-centric Food Image Analysis
Liangyu Wang (The University of Tokyo)*; Yoko Yamakata (University of Tokyo, Japan); Kiyoharu Aizawa (The University of Tokyo)
156 TelEmoScatter: Enabling Remote Interaction and Emotional Connections in Virtual and Physical Music Performance
Chen-Wei Fu (National Taiwan University of Science and Technology); Wei-Lun Huang (National Taiwan University of Science and Technology); Pin-Xuan Liu (National Tsing Hua University); Yu-Hsuan Chen (National Taiwan University of Science and Technology); Ming-Cong Su (National Taiwan University of Science and Technology)*; Andrew Chen (National Taiwan Normal University); Ping-Hsuan Han (National Taipei University of Technology); Tse-Yu Pan (National Taiwan University of Science and Technology)
221 Semantic-Aware Real-time Digital Avatar Animation from Monocular Motion Capture
Ruizhi Chen (Baidu Inc.)*; Zhiqiang Feng (Baidu Inc.); Fengguo Li (baidu); Ying Xu (Tsinghua university); haojie liu (baidu ); Borong Liang (Baidu-Vis); Shihao Zou (University of Alberta); Xinxin Zuo (University of Alberta); Hang Zhou (Baidu Inc.); Haocheng Feng (Baidu Inc.); Errui Ding (Baidu Inc.); jingtuo liu (baidu); Li Cheng (ECE dept., University of Alberta); Jingdong Wang (Baidu)
325 Directional Sound Source Representation Using Paired Microphone Array with Different Characteristics Suitable for Volumetric Video Capture
Shota Okubo (KDDI Research, Inc.)*; Tomoaki Konno (KDDI Research, Inc.); Toshiharu Horiuchi (KDDI Research, Inc.); Tatsuya Kobayashi (KDDI Research, Inc.)
327 A consulting system for guiding various image recognitions
Ryo Kawai (NEC Corporation)*; Noboru Yoshida (NEC Corporation); Jianquan Liu (NEC Corporation)
329 VLM-BCD: Unsupervised Building Change Detection
Yiyun Zhang (The University of Queensland)*; Zijian Wang (University of Queensland)
268 Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization
Xiaojiao Chen (Xinjiang University); Sheng Li (National Institute of Information & Communications Technology (NICT))*; Jiyi Li (University of Yamanashi); Hao Huang (Xinjiang University); Yang Cao (Hokkaido University); Liang He (Tsinghua University)
269 GhostVec: A New Threat to Speaker Privacy of End-to-End Speech Recognition System
Xiaojiao Chen (Xinjiang University ); Sheng Li (National Institute of Information & Communications Technology (NICT))*; Jiyi Li (University of Yamanashi); Yang Cao (Hokkaido University); Hao Huang (Xinjiang University); Liang He (Tsinghua University)
16:00 ~ 17:30 Oral 13 Action and Scenes Zoom meeting room 1 (Session Chair: Prof. Yi-Yu Hsu (National Cheng Kung University))
61 NeRF-IS: Explicit Neural Radiance Fields in Semantic Space
Jiansong Sha (Artificial Intelligence Research Center, Defense Innovation Institute)*; Haoyu Zhang (Artificial Intelligence Research Center, Defense Innovation Institute); yuchen Pan (Artificial Intelligence Research Center, Defense Innovation Institute); Guang Kou (Artificial Intelligence Research Center, Defense Innovation Institute); Xiaodong Yi (Artificial Intelligence Research Center, Defense Innovation Institute)
68 NeRF-SDP: Efficient Generalizable Neural Radiance Field with Scene Depth Perception
Qiuwen Wang (Shanghai Jiao Tong University); Shuai Guo (Shanghai Jiaotong University); Haoning Wu (Shanghai Jiao Tong University); Rong Xie (Shanghai Jiao Tong University); Li Song (Shanghai Jiao Tong University)*; Wenjun Zhang (Shanghai Jiao Tong University)
199 A Spatial-Spectral Decoupling Fusion Framework for Visible and Near-Infrared Images
Zhenglin Tang (Beihang Univeristy)*; Hai-Miao Hu (Beihang Univeristy)
316 Improving Class Representation for Zero-Shot Action Recognition
Lijuan Zhou (Zhengzhou University )*; Jianing Mao (Zhengzhou University)
319 Learning a Robust Model with Pseudo Boundaries for Noisy Temporal Action Localization
xinyi yuan (University of Science and Technology of China)*; Liansheng Zhuang (University of Science and Technology of China)
16:00 ~ 17:30 Oral 14 Protection and Communications Zoom meeting room 2 (Session Chair: Prof. Ching-Chun Huang (National Yang Ming Chiao Tung University))
31 Personalized Federated Learning via Backbone Self-Distillation
Pengju Wang (Chinese Academy of Sciences); Bochao Liu (Chinese Academy of Sciences); Dan Zeng (Shanghai University); Chenggang Yan (Hangzhou Dianzi University); Shiming Ge (Chinese Academy of Sciences)*
170 Achieving Privacy-Preserving Multi-View Consistency with Advanced 3D-Aware Face De-identification
Jingyi Cao (Shanghai Jiao Tong University)*; Bo Liu (University of Technology Sydney (UTS) ); Yunqian Wen (Shanghai Jiao Tong University); Rong Xie (Shanghai Jiao Tong University); Li Song (Shanghai Jiao Tong University)
226 Prior Knowledge Guided Network for Video Anomaly Detection
Zhewen Deng (Northeastern University)*; Dongyue Chen (Northeastern University); Shizhuo Deng (Northeastern University)
239 Multi-Task Self-Blended Images for Face Forgery Detection
Po-Han Huang (National Taiwan University of Science and Technology); Yue-Hua Han (National Taiwan University of Science and Technology); Ernie Chu (Academia Sinica); Jun-Cheng Chen (Academia Sinica)*; Kai-Lung Hua (NTUST)
243 Toward Optimal Real-time Dynamic Point Cloud Streaming over Bandwidth-constrained Networks
Quang Long Nguyen (School of Electrical and Electronic Engineering, Hanoi University of Science and Technology)*; Duc Nguyen (Tohoku Institute of Technology); Thu Huong Truong (Department of Communication Engineering,\\School of Electronics and Telecommunications,\\Hanoi University of Science and Technology)
16:00 ~ 17:30 Oral 15 Multimedia Applications Zoom meeting room 3 (Session Chair: Prof. Keiji Yanai (The University of Electro-Communications, Tokyo))
70 Adaptive Fusion for Visual Question Answering: Integrating Multi-Label Classification and Similarity Matching
Zhengtao Z Yu (Fuyang Normal University); Jia Zhao (Fuyang Normal University)*; Huiling Wang (Fuyang normal university); Chenliang Guo (Fuyang Normal University); Tong Zhou (Fuyang Normal University); Chongxiang Sun (Fuyang Normal University)
307 Vision-Language Navigation for Quadcopters with Conditional Transformer and Prompt-based Text Rephraser
Zhe Chen (Hangzhou Dianzi University); Jiyi Li (University of Yamanashi)*; Fumiyo Fukumoto (University of Yamanashi); Peng Liu (Hangzhou Dianzi University); Yoshimi Suzuki (University of Yamanashi)
183 Image Cropping under Design Constraints
Takumi Nishiyasu (The University of Tokyo)*; Wataru Shimoda (CyberAgent.Inc); Yoichi Sato (University of Tokyo)
291 RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization
Avinash Anand (IIIT Delhi)*; Raj Jaiswal (IIIT DELHI); Mohit Gupta (IIIT Delhi); Siddhesh S Bangar (Vidyalankar Institute of Technology ); Pijush Bhuyan (IIITD); Naman Lal (MIDAS Lab, IIIT Delhi); Rajeev Singh (Vidyalankar Institute of Technology); Ritika Jha (School of engineering, Jawaharlal Nehru University, New Delhi); Rajiv Ratn Shah (IIIT Delhi); Shin'ichi Satoh (National Institute of Informatics)
172 Moving Inside the Box: Interacting with Interpretation of Historical Artefacts Through Tangible Augmented Reality
Suzanne Kobeisse (University of Gloucestershire)*; Lars Erik Holmquist (Nottingham Trent University)
18:00 ~ Banquet

December 8
09:00 ~ 10:00 Workshop 2 Workshop on Intelligent Sports Technologies (WIST) Zoom meeting room 2
09:00 ~ 10:00 Workshop 3 Workshop on Low-quality Visual Data for Computer Vision and Media Zoom meeting room 3
10:00 ~ 10:30 Coffee break 10:30 ~ 12:00 Workshop 2 Workshop on Intelligent Sports Technologies (WIST) Zoom meeting room 2
10:30 ~ 12:00 Workshop 3 Workshop on Low-quality Visual Data for Computer Vision and Media Zoom meeting room 3
12:00 ~ 14:00 Lunch 14:00 ~ 15:30 Tutorial 1 Geometric deep learning and its applications for Multimedia Zoom meeting room 1 14:00 ~ 15:30 Tutorial 2 Streaming Media: Algorithms, Protocols and Systems Zoom meeting room 2
14:00 ~ 15:30 Workshop 1 M3Oriental: Workshop of Multimodal, Multilingual and Multitask Modeling Technologies for Oriental Languages Zoom meeting room 3
15:30 ~ 16:00 Coffee break 16:00 ~ 17:30 Tutorial 1 Geometric deep learning and its applications for Multimedia Zoom meeting room 1 16:00 ~ 17:30 Tutorial 2 Streaming Media: Algorithms, Protocols and Systems Zoom meeting room 2
16:00 ~ 17:30 Workshop 1 M3Oriental: Workshop of Multimodal, Multilingual and Multitask Modeling Technologies for Oriental Languages Zoom meeting room 3