Optimized Autoencoder-Driven Semantic Feature Enhancement for Zero-Shot Image Classification

Shaista Khanam; Poonam Sonar

doi:10.46604/aiti.2026.16231

Authors

Shaista Khanam Rajiv Gandhi Institute of Technology, Maharashtra, India
Poonam Sonar Rajiv Gandhi Institute of Technology, Maharashtra, India

DOI:

https://doi.org/10.46604/aiti.2026.16231

Keywords:

autoencoder, language models, semantic feature optimization, memory consumption, time complexity

Abstract

Zero-shot learning (ZSL) identifies unseen categories using semantic knowledge transferred from seen classes. Its effectiveness depends on visual and semantic representations. This study aims to develop an optimized autoencoder-driven semantic feature extraction (OADSFE) framework based on a hybrid feature approach (HFA). The HFA combines deep spatial representations with multi-scale texture information to characterize visual data. Semantic features are derived using fastText, GloVe, BERT, and MPNet, which are evaluated independently. An autoencoder-based post-embedding optimization module compresses high-dimensional semantic embeddings into a compact latent space while preserving discriminative information, reducing memory usage and testing time. Evaluation on the AWA2, SUN, and CUB benchmark datasets demonstrates that the proposed framework achieves up to a 16.29% reduction in testing time and an 89.42% reduction in memory usage while maintaining classification performance across multiple embedding configurations. The proposed framework performs well across diverse datasets and semantic embedding strategies, indicating its suitability for scalable ZSL applications.

References

G. Ramesh, M. Sahil, S. A. Palan, D. Bhandary, T. A. Ashok, J. Shreyas, et al., “A Review on NLP Zero-Shot and Few-Shot Learning: Methods and Applications,” Discover Applied Sciences, vol. 7, no. 9, article no. 966, 2025.

L. Zhang, T. Xiang, and S. Gong, “Learning a Deep Embedding Model for Zero-Shot Learning,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 3010-3019, 2017.

A. Mishra, S. Krishna Reddy, A. Mittal, and H. A. Murthy, “A Generative Model for Zero Shot Learning Using Conditional Variational Autoencoders,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE, pp. 2188-2196, 2018.

F. Al Machot, M. Ullah, and H. Ullah, “HFM: A Hybrid Feature Model Based on Conditional Auto Encoders for Zero-Shot Learning,” Journal of Imaging, vol. 8, no. 6, article no. 171, 2022.

M. Zhang, X. Wang, Y. Shi, S. Ren, and W. Wang, “Zero-Shot Learning with Joint Generative Adversarial Networks,” Electronics, vol. 12, no. 10, article no. 2308, 2023.

B. Ding, Y. Fan, Y. He, and J. Zhao, “Enhanced VAEGAN: A Zero-Shot Image Classification Method,” Applied Intelligence, vol. 53, no. 8, pp. 9235-9246, 2023.

A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, M. A. Ranzato, et al., “DeViSE: A Deep Visual-Semantic Embedding Model,” Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 2121-2129, 2013.

B. Romera-Paredes and P. Torr, “An Embarrassingly Simple Approach to Zero-Shot Learning,” Proceedings of the International Conference on Machine Learning (ICML), PMLR, pp. 2152-2161, 2015.

Y. Xian, Z. Akata, G. Sharma, Q. Nguyen, M. Hein, and B. Schiele, “Latent Embeddings for Zero-Shot Classification,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 69-77, 2016.

A. S. Khanam and P. N. Sonar, “Enhanced Zero Shot Learning Using Deep Neural Network ResNet50,” Proceedings of the 2023 4th International Conference for Emerging Technology (INCET), IEEE, pp. 1-6, 2023.

S. Khanam and P. N. Sonar, “Hybrid Feature Approach for Enhancing Zero-Shot Image Classification,” in Artificial Intelligence and Knowledge Processing, H. Hemachandran, R. V. Rodriguez, M. Rege, A. Ade-Ibijola, K.-L. Ong, and V. Piuri, Eds., Cham: Springer Nature Switzerland, pp. 239-251, 2025.

C. Wang, P. Nulty, and D. Lillis, “A Comparative Study on Word Embeddings in Deep Learning for Text Classification,” Artificial Intelligence Review, vol. 55, no. 2, pp. 1501-1541, 2022.

D. Cheng, G. Wang, B. Wang, Q. Zhang, J. Han, and D. Zhang, “Hybrid Routing Transformer for Zero-Shot Learning,” Pattern Recognition, vol. 137, article no. 109270, 2023.

S. Chen, W. Hou, S. Khan, and F. S. Khan, “Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 23964-23974, 2024.

Y. Palagummi and S. Rowlands, “Shifted Window Based Self-Attention via Swin Transformer for Zero-Shot Learning,” International Journal of Computer and Information Engineering, vol. 17, no. 10, pp. 524-531, 2023.

F. Alamri and A. Dutta, “Multi-Head Self-Attention via Vision Transformer for Zero-Shot Learning,” Proceedings of the Irish Machine Vision and Image Processing Conference (IMVIP), 2021.

K. Berahmand, F. Daneshfar, E. S. Salehi, Y. Li, and Y. Xu, “Autoencoders and Their Applications in Machine Learning: A Survey,” Artificial Intelligence Review, vol. 57, article no. 28, 2024.

E. Kodirov, T. Xiang, and S. Gong, “Semantic Autoencoder for Zero-Shot Learning,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 4447-4456, 2017.

X. Zhang, Y. Zhang, and F. Shen, “Bi-shifting Semantic Auto-Encoder for Zero-Shot Learning,” Knowledge-Based Systems, vol. 244, article no. 108531, 2022.

J. Li, C. Chen, and W. Liu, “Zero-Shot Learning via Discriminative Dual Semantic Auto-Encoder,” Neurocomputing, vol. 417, pp. 117–126, 2020.

Y. Liu, Q. Gao, J. Li, J. Han, and L. Shao, “Zero Shot Learning via Low-Rank Embedded Semantic Autoencoder,” Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), AAAI Press, pp. 2490–2496, 2018.

Y. Liu, X. Gao, J. Han, L. Liu, and L. Shao, “Zero-Shot Learning via a Specific Rank-Controlled Semantic Autoencoder,” Pattern Recognition, vol. 122, article no. 108237, 2022.

W. Heyden, H. Ullah, M. S. Siddiqui, and F. Al Machot, “An Integral Projection-Based Semantic Autoencoder for Zero-Shot Learning,” IEEE Access, vol. 11, pp. 85351-85360, 2023.

G. Patterson and J. Hays, “SUN Attribute Database: Discovering, Annotating, and Recognizing Scene Attributes,” https://cs.brown.edu/~gmpatter/sunattributes.html, accessed in 2024.

Y. Xian, C. H. Lampert, B. Schiele, and Z. Akata, “Animals with Attributes 2: A Free Dataset for Attribute-Based Classification and Zero-Shot Learning,” https://cvml.ista.ac.at/AwA2/, 2018.

C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The Caltech-UCSD Birds-200-2011 Dataset,” California Institute of Technology, Technical Report CNS-TR-2011-001, Pasadena, CA, USA, 2011.

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pp. 4171-4186, 2019.

J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global Vectors for Word Representation,” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532-1543, 2014.

S. Changpinyo, W. L. Chao, B. Gong, and F. Sha, “Synthesized Classifiers for Zero-Shot Learning,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Press, pp. 5327-5336, 2016.