BharatGen presents Param-2-17B-MoE-A2.4B, a large-scale Mixture-of-Experts (MoE) language model designed to deliver high model capacity while retaining the inference efficiency of a much smaller dense model. It uses a Hybrid MoE architecture with 17B total parameters, while activating only 2.4B parameters per token.
* 17B parameter Mixture of Experts (MoE) language model * Multilingual: English, Hindi + 21 Indian languages * Trained on ~22 trillion tokens across two pretraining phases * Uses 64 specialized experts, dynamically activated per token * Supports long-context understanding (up to 4096 tokens) * Efficient inference: Only 2.4B active parameters per token * Advanced Capabilities: Thinking & Reasoning, Tool Calling, Mathematics, Code Generation * Designed for diverse downstream applications and further fine-tuning
Attribution-Non-Commercial 4.0 International (CC BY-NC 4.0)
bharatgenai
Transformers
PyTorch
Open
Other
13/03/26 11:03:42
25.33 GB
Attribution-Non-Commercial 4.0 International (CC BY-NC 4.0)
© 2026 - Copyright AIKosh. All rights reserved. This portal is developed by National e-Governance Division for AIKosh mission.