A fine-tuned model combining Llama-3-8B with LLM2CLIP for improved cross-modal tasks and instruction-based applications.
LLM2CLIP-Llama-3-8B-Instruct-CC-Finetuned integrates the Llama-3-8B language model with the LLM2CLIP framework, fine-tuned on instruction-based datasets and Conceptual Captions (CC). This combination enhances the model's ability to understand and generate contextually relevant responses in cross-modal tasks, such as image captioning and text-based image retrieval. The fine-tuning process ensures that the model can effectively interpret and respond to instructional prompts, making it versatile for various applications.
Apache 2.0
Weiquan Huang and Aoqi Wu and Yifan Yang and Xufang Luo and Yuqing Yang and Liang Hu and Qi Dai and Xiyang Dai and Dongdong Chen and Chong Luo and Lili Qiu
vision foundation model, feature backbone
Other
Open
Sector Agnostic
20/08/25 05:44:54
0
Apache 2.0
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.