會議論文

學年 113
學期 2
發表日期 2025-06-25
作品名稱 Research on Performance Improvement of Vision Transformer Model Based on BEiT
作品名稱(其他語言)
著者 Zhe-Wei Liu, Chii-Jen Chen*
作品所屬單位
出版者
會議名稱 The International Conference on Recent Advancements in Computing in AI, IoT and Computer Engineering Technology (CICET 2025)
會議地點 New Taipei, Taiwan
摘要 Vision Transformer (ViT) has demonstrated exceptional performance in image classification tasks across large-scale datasets. However, its application in domain-specific or small-scale datasets remains a challenge. This research explores an alternative approach to image patch generation, replacing the fixed-size patch mechanism in ViT with semantic-aware segmentation using the Segment Anything Model (SAM). We focus on applying this technique to datasets such as marine biology, animals, and plants, where semantic consistency plays a more critical role. The segmented patches are compared to the conventional 16×16 patches used in ViT to evaluate their potential to enhance semantic representation. Preliminary results suggest that using SAM-based patches can introduce better localized and meaningful features, providing a foundation for performance enhancement in downstream tasks.
關鍵字 Vision Transformer, BEiT, Semantic Segmentation, Small Datasets, Self-Supervised Learning
語言 en_US
收錄於
會議性質 國際
校內研討會地點 淡水校園
研討會時間 20250625~20250627
通訊作者
國別 TWN
公開徵稿
出版型式
出處