Bharat MiniGPT 350M is a custom GPT-style causal language model built from scratch by Harshvardhan Mishra in PyTorch and later made HuggingFace compatible. This 3B tokens experiment is currently a pretrained base model and not instruction-tuned yet. The architecture uses modern LLM components including RoPE, RMSNorm, SwiGLU, and SDPA Attention, designed for research, experimentation, and future fine-tuning.
Bharat MiniGPT 350M is a custom GPT-style causal language model trained from scratch by Harshvardhan Mishra using modern LLM architecture components such as RoPE, RMSNorm, SwiGLU, and SDPA Attention. This is not a fine-tuned GPT-2 or LLaMA variant. The architecture and training pipeline were implemented manually in PyTorch and later integrated into the HuggingFace ecosystem. The current release is a 3B tokens pretrained base model experiment and is not instruction-tuned yet. A better version with more tokens and fine-tuning support is planned in future updates. Model Details: * Parameters: ~350 Million * Architecture: Decoder-only Transformer * Layers: 24 Transformer Blocks * Attention Heads: 16 * Embedding Size: 1024 * Context Length: 768 Tokens * Vocabulary Size: 50,257 * Position Encoding: RoPE * Normalization: RMSNorm * Feed Forward: SwiGLU * Attention: SDPA / Flash Attention Compatible * Precision: FP16 Training Training Data: * HuggingFaceFW/fineweb (sample-10BT) — 40% * HuggingFaceFW/fineweb-edu (sample-10BT) — 30% * Wikimedia Wikipedia (20231101.en) — 30% Training Setup: * Optimizer: AdamW * Learning Rate: 3e-4 * Min LR: 3e-5 * Warmup Steps: 51,200 * LR Scheduler: Cosine Decay * Gradient Accumulation: 128 * Mixed Precision: FP16 * Gradient Clipping: 1.0 Features: * Custom GPT architecture * RoPE positional embeddings * RMSNorm normalization * SwiGLU feed-forward layers * Flash Attention compatible SDPA * HuggingFace generate() support * KV-cache compatible * Weight tying support * Gradient checkpointing during training The model was evaluated using EleutherAI LM Evaluation Harness on benchmark tasks such as ARC Easy, HellaSwag, and PIQA. Explore More: [Bharat MiniGPT 350M Project Page](https://iotbyhvm.ooo/bharat-minigpt-350m-a-custom-gpt-style-llm-built-from-scratch-in-india/) Disclaimer: Bharat MiniGPT 350M is an experimental pretrained base model developed for research and educational purposes. The model is not instruction-tuned yet and may generate inaccurate, biased, or incomplete responses.
Apache 2.0
Harshvardhan Mishra
Transformers
PyTorch
Open
Science, Technology and Research
21/05/26 07:44:53
0
Apache 2.0
© 2026 - Copyright AIKosh. All rights reserved.