A fine-tuned model combining Llama-3.2-1B with LLM2CLIP for enhanced cross-modal tasks and instruction-based applications.
LLM2CLIP-Llama-3.2-1B-Instruct-CC-Finetuned merges the Llama-3.2-1B language model with the LLM2CLIP framework, fine-tuned on instruction-based datasets and Conceptual Captions (CC). This integration improves the model's proficiency in understanding and generating contextually appropriate responses in cross-modal tasks, such as image captioning and text-based image retrieval. The fine-tuning process ensures the model effectively interprets instructional prompts, enhancing its versatility across various applications.
Apache 2.0
Weiquan Huang and Aoqi Wu and Yifan Yang and Xufang Luo and Yuqing Yang and Liang Hu and Qi Dai and Xiyang Dai and Dongdong Chen and Chong Luo and Lili Qiu
vision foundation model, feature backbone
Other
Open
Sector Agnostic
20/08/25 05:44:22
0
Apache 2.0
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.