學年
|
113 |
學期
|
2 |
發表日期
|
2025-06-25 |
作品名稱
|
Research on Performance Improvement of Vision Transformer Model Based on BEiT |
作品名稱(其他語言)
|
|
著者
|
Zhe-Wei Liu, Chii-Jen Chen* |
作品所屬單位
|
|
出版者
|
|
會議名稱
|
The International Conference on Recent Advancements in Computing in AI, IoT and Computer Engineering Technology (CICET 2025) |
會議地點
|
New Taipei, Taiwan |
摘要
|
Vision Transformer (ViT) has demonstrated exceptional performance in image classification tasks across large-scale datasets. However, its application in domain-specific or small-scale datasets remains a challenge. This research explores an alternative approach to image patch generation, replacing the fixed-size patch mechanism in ViT with semantic-aware segmentation using the Segment Anything Model (SAM). We focus on applying this technique to datasets such as marine biology, animals, and plants, where semantic consistency plays a more critical role. The segmented patches are compared to the conventional 16×16 patches used in ViT to evaluate their potential to enhance semantic representation. Preliminary results suggest that using SAM-based patches can introduce better localized and meaningful features, providing a foundation for performance enhancement in downstream tasks. |
關鍵字
|
Vision Transformer, BEiT, Semantic Segmentation, Small Datasets, Self-Supervised Learning |
語言
|
en_US |
收錄於
|
|
會議性質
|
國際 |
校內研討會地點
|
淡水校園 |
研討會時間
|
20250625~20250627 |
通訊作者
|
|
國別
|
TWN |
公開徵稿
|
|
出版型式
|
|
出處
|
|