Indian Flag
Government Of India
A-
A
A+

Bharat MiniGPT 350M

Bharat MiniGPT 350M is a custom GPT-style causal language model built from scratch by Harshvardhan Mishra in PyTorch and later made HuggingFace compatible. This 3B tokens experiment is currently a pretrained base model and not instruction-tuned yet. The architecture uses modern LLM components including RoPE, RMSNorm, SwiGLU, and SDPA Attention, designed for research, experimentation, and future fine-tuning.

About Model

Bharat MiniGPT 350M is a custom GPT-style causal language model trained from scratch by Harshvardhan Mishra using modern LLM architecture components such as RoPE, RMSNorm, SwiGLU, and SDPA Attention. This is not a fine-tuned GPT-2 or LLaMA variant. The architecture and training pipeline were implemented manually in PyTorch and later integrated into the HuggingFace ecosystem. The current release is a 3B tokens pretrained base model experiment and is not instruction-tuned yet. A better version with more tokens and fine-tuning support is planned in future updates. Model Details: * Parameters: ~350 Million * Architecture: Decoder-only Transformer * Layers: 24 Transformer Blocks * Attention Heads: 16 * Embedding Size: 1024 * Context Length: 768 Tokens * Vocabulary Size: 50,257 * Position Encoding: RoPE * Normalization: RMSNorm * Feed Forward: SwiGLU * Attention: SDPA / Flash Attention Compatible * Precision: FP16 Training Training Data: * HuggingFaceFW/fineweb (sample-10BT) — 40% * HuggingFaceFW/fineweb-edu (sample-10BT) — 30% * Wikimedia Wikipedia (20231101.en) — 30% Training Setup: * Optimizer: AdamW * Learning Rate: 3e-4 * Min LR: 3e-5 * Warmup Steps: 51,200 * LR Scheduler: Cosine Decay * Gradient Accumulation: 128 * Mixed Precision: FP16 * Gradient Clipping: 1.0 Features: * Custom GPT architecture * RoPE positional embeddings * RMSNorm normalization * SwiGLU feed-forward layers * Flash Attention compatible SDPA * HuggingFace generate() support * KV-cache compatible * Weight tying support * Gradient checkpointing during training The model was evaluated using EleutherAI LM Evaluation Harness on benchmark tasks such as ARC Easy, HellaSwag, and PIQA. Explore More: [Bharat MiniGPT 350M Project Page](https://iotbyhvm.ooo/bharat-minigpt-350m-a-custom-gpt-style-llm-built-from-scratch-in-india/) Disclaimer: Bharat MiniGPT 350M is an experimental pretrained base model developed for research and educational purposes. The model is not instruction-tuned yet and may generate inaccurate, biased, or incomplete responses.

Bharat MiniGPT 350M

Metadata Metadata

Apache 2.0

Harshvardhan Mishra

Transformers

PyTorch

Open

HVM SMART SOLUTIONS

Science, Technology and Research

21/05/26 07:44:53

0

Activity Overview Activity Overview

  • Downloads0
  • Redirect 0
  • File Size 0
  • Views 10

Tags Tags

  • PyTorch
  • Transformers
  • llm
  • custom-architecture
  • gpt
  • causal-lm
  • rope
  • bharat-minigpt
  • swiglu
  • rmsnorm

License Control License Control

Apache 2.0

More Models from HVM SMART SOLUTIONS More Models from HVM SMART SOLUTIONS

Bharat MiniGPT 350M
Bharat MiniGPT 350M is a custom GPT-style causal language model built from scratch by Harshvardhan Mishra in PyTorch and later made HuggingFace compatible. This 3B tokens experiment is currently a pretrained base model and not instruction-tuned yet. The architecture uses modern LLM components including RoPE, RMSNorm, SwiGLU, and SDPA Attention, designed for research, experimentation, and future fine-tuning.
PyTorch
Transformers
llm
custom-architecture
gpt
causal-lm
rope
bharat-minigpt
swiglu
rmsnorm
  • See Upvoters0
  • Downloads0
  • File Size0
  • Views11
Updated 1 day(s) ago

HVM SMART SOLUTIONS