AI & LLM Engineering Internship

India
Computer / Information Technology
Remote

Internship

Posted 11 hours ago

What You'll Own:

  • You are responsible for making every model call in the platform as fast and cost-efficient as possible. You benchmark inference providers (Groq, Cerebras, Fireworks), configure speculative decoding, and ensure the Model Router always picks the fastest available free-tier endpoint.

Responsibilities:

  • Benchmark Groq, Cerebras, Together AI, Fireworks, and OpenRouter across latency, throughput, and accuracy on agentic tasks

  • Configure Ollama 0.3.x with speculative decoding for local fallback models

  • Build and maintain the ModelRouter class with automatic rate-limit detection and provider rotation

  • Profile memory and token usage per worker agent and reduce average cost per task by 40%

  • Write inference optimization documentation for the team

Requirements:

  • Python (intermediate) — can read and write async code

  • Familiarity with REST APIs and JSON

  • Basic understanding of LLMs (knows what tokens, temperature, and context window mean)

  • Bonus: Prior exposure to Ollama, LM Studio, or any local model runner

You'll Learn: vLLM, Cerebras WSE-3 API, OpenRouter free routing, speculative decoding, provider failover architecture

Problem Solving
Attention to Detail
Teamwork and Collaboration
Proficiency
Professional Development
Flexible Hours
Remote Work
Training Provided
Work-Life Balance
Graduate-Friendly

[email protected]

750 - 1.8K per month

A. Coding + Problem Solving

  • Python (mandatory)

  • Basic DSA (arrays, strings, hashing)

  • Possibly:

    • Data preprocessing task

About the company

Information Technology
India