BharatGen introduces the early checkpoint of SFT (Supervised Fine-Tuned) for Param 1, a bilingual language model trained from scratch in English and Hindi. With 2.9 billion parameters, this checkpoint builds upon the pretraining phase and serves as a foundation for more downstream tasks, safety testing, and customization.
Pre-Training Details: * Dataset: 7.5 Trillion tokens * Data Quality: Highly curated with standard filtering and multiple processing steps. * Scheduler: Cosine Annealing * Learning_rate: 3e-4 to 3e-6 * Training Setup: Running on 512 H100 GPUs * Framework: NVIDIA NeMo * Precision: bf16-mixed * Base Pre-Trained Checkpoint (Param 1): https://aikosh.indiaai.gov.in/home/models/details/bharatgen_param_1_indic_scale_bilingual_foundation_model.html SFT Training Details: * Dataset: 0.8 Million samples * Epochs: 3 * Scheduler: Cosine Annealing * Learning Rate: 5e-6 to 5e-8 * Training Hardware: 32 H200 GPUs * Framework: NVIDIA NeMo * Precision: bf16-mixed
Attribution-Non-Commercial 4.0 International (CC BY-NC 4.0)
bharatgenai
Transformers
Transformers
Open
Other
07/05/26 10:02:42
5.36 GB
To preview this file, you need to be a registered user. Please complete the registration process to gain access and continue viewing the content.
Attribution-Non-Commercial 4.0 International (CC BY-NC 4.0)
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.