Indian Flag
Government Of India
A-
A
A+
Sarathi AgriData

Sarathi AgriData

The Agri-Advisory Synthetic Dataset (Hindi) is a large-scale instruction-tuning dataset for Indian agriculture, containing 2,20,222 biologically validated advisory examples. Generated through a multi-stage pipeline separating scientific feasibility from language generation, each record includes a structured agricultural scenario, a Chain-of-Thought reasoning trace, and a final Hindi advisory. The dataset supports chatbots, IVR, and radio-style delivery.

About Dataset

The Agri-Advisory Synthetic Dataset (Hindi) is a large-scale, high-quality dataset with 2,20,222 agricultural advisory examples designed for training LLMs in the Indian farming context. Each record includes a scientifically validated crop scenario, the model’s reasoning (Chain-of-Thought), and a clear, actionable Hindi advisory. The dataset is generated using a multi-stage pipeline that filters out biologically invalid scenarios before text generation, ensuring reliability and safety. It supports multiple advisory styles (chat, radio, IVR) through persona-based instructions and enforces strict guardrails for organic and conventional practices. Available in JSONL and Parquet, it is ideal for instruction tuning, reasoning-focused training, and real-world agricultural AI applications.

Purpose of Dataset

Utility, Use Cases, And Policy Relevance The Agri-advisory Synthetic Dataset (Hindi) Has Strong Practical Utility Across Research, Deployment, And Governance Of Agricultural Ai Systems In India. Utility & Use Cases Llm Training For Agri-advisory Systems: Enables Instruction Tuning And Reasoning-aware Training Of Models That Provide Crop-, Region-, And Stage-specific Advice In Hindi, Improving Accuracy And Farmer Trust. Farmer-facing Applications: Suitable For Chatbots, Whatsapp Advisories, Ivr Systems, And Community Radio Broadcasts Due To Persona-driven Outputs And Tts-friendly Formats. Decision Support For Extension Services: Can Assist Krishi Vigyan Kendras (Kvks), Agri-extension Officers, And Ngos With Consistent, Localized Advisory Content. Reasoning & Safety Research: The Inclusion Of Chain-of-thought Allows Evaluation Of Model Reasoning Quality, Robustness, And Hallucination Reduction In High-stakes Domains. Low-resource Language Enablement: Strengthens Hindi Agricultural Nlp Resources, Addressing A Major Gap In Non-english, Domain-specific Datasets. Policy Relevance Alignment With Digital Agriculture Initiatives: Supports National Programs Such As Digital Agriculture Mission, Agristack, And Ai-based Farmer Advisory Platforms. Safe & Responsible Ai: Pre-validation Of Biological Feasibility And Enforced Guardrails (Organic Vs. Chemical, Ppe Warnings) Align With Emerging Ai Safety And Responsible Deployment Guidelines. Scalable Public Advisory Infrastructure: Enables Cost-effective, Scalable Dissemination Of Scientifically Grounded Advisories To Small And Marginal Farmers. Evidence-based Policymaking: Can Be Used To Simulate And Test Advisory Policies, Stress Scenarios, And Climate-impact Responses Before Real-world Rollout. Overall, The Dataset Bridges Scientific Agricultural Knowledge With Trustworthy Ai Generation, Making It Valuable For Both Operational Agri-tech Systems And Policy-driven Digital Agriculture Initiatives In India.

Activity Overview Activity Overview

  • Downloads0
  • Redirect 0
  • Views 14
  • File Size 0

Tags Tags

  • Dataset
  • Agricultural Irrigation
  • Agricultural Infrastructure
  • Agricultural Landholding
  • Agricultural Production
  • agriculture
  • Crop Variety Data
  • Crop Variety Adoption
  • Agricultural Planning
  • Agri-Data

License Control License Control

Attribution 4.0 International (CC BY- 4.0)