Skip to main content

Overview

CrewAI integrates with multiple LLM providers through LiteLLM, giving you the flexibility to choose the right model for your specific use case. This guide will help you understand how to configure and use different LLM providers in your CrewAI projects.

What are LLMs?

Large Language Models (LLMs) are the core intelligence behind CrewAI agents. They enable agents to understand context, make decisions, and generate human-like responses. Here’s what you need to know:

LLM Basics

Large Language Models are AI systems trained on vast amounts of text data. They power the intelligence of your CrewAI agents, enabling them to understand and generate human-like text.

Context Window

The context window determines how much text an LLM can process at once. Larger windows (e.g., 128K tokens) allow for more context but may be more expensive and slower.

Temperature

Temperature (0.0 to 1.0) controls response randomness. Lower values (e.g., 0.2) produce more focused, deterministic outputs, while higher values (e.g., 0.8) increase creativity and variability.

Provider Selection

Each LLM provider (e.g., OpenAI, Anthropic, Google) offers different models with varying capabilities, pricing, and features. Choose based on your needs for accuracy, speed, and cost.

Setting up your LLM

There are different places in CrewAI code where you can specify the model to use. Once you specify the model you are using, you will need to provide the configuration (like an API key) for each of the model providers you use. See the provider configuration examples section for your provider.
  • 1. Environment Variables
  • 2. YAML Configuration
  • 3. Direct Code
The simplest way to get started. Set the model in your environment directly, through an .env file or in your app code. If you used crewai create to bootstrap your project, it will be set already.
.env
MODEL=model-id  # e.g. gpt-4o, gemini-2.0-flash, claude-3-sonnet-...

# Be sure to set your API keys here too. See the Provider
# section below.
Never commit API keys to version control. Use environment files (.env) or your system’s secret management.

Provider Configuration Examples

CrewAI supports a multitude of LLM providers, each offering unique features, authentication methods, and model capabilities. In this section, you’ll find detailed examples that help you select, configure, and optimize the LLM that best fits your project’s needs.
Set the following environment variables in your .env file:
Code
# Required
OPENAI_API_KEY=sk-...

# Optional
OPENAI_API_BASE=<custom-base-url>
OPENAI_ORGANIZATION=<your-org-id>
Example usage in your CrewAI project:
Code
from crewai import LLM

llm = LLM(
    model="openai/gpt-4", # call model by provider/model_name
    temperature=0.8,
    max_tokens=150,
    top_p=0.9,
    frequency_penalty=0.1,
    presence_penalty=0.1,
    stop=["END"],
    seed=42
)
OpenAI is one of the leading providers of LLMs with a wide range of models and features.
ModelContext WindowBest For
GPT-48,192 tokensHigh-accuracy tasks, complex reasoning
GPT-4 Turbo128,000 tokensLong-form content, document analysis
GPT-4o & GPT-4o-mini128,000 tokensCost-effective large context processing
o3-mini200,000 tokensFast reasoning, complex reasoning
o1-mini128,000 tokensFast reasoning, complex reasoning
o1-preview128,000 tokensFast reasoning, complex reasoning
o1200,000 tokensFast reasoning, complex reasoning
Meta’s Llama API provides access to Meta’s family of large language models. The API is available through the Meta Llama API. Set the following environment variables in your .env file:
Code
# Meta Llama API Key Configuration
LLAMA_API_KEY=LLM|your_api_key_here
Example usage in your CrewAI project:
Code
from crewai import LLM

# Initialize Meta Llama LLM
llm = LLM(
    model="meta_llama/Llama-4-Scout-17B-16E-Instruct-FP8",
    temperature=0.8,
    stop=["END"],
    seed=42
)
All models listed here https://llama.developer.meta.com/docs/models/ are supported.
Model IDInput context lengthOutput context lengthInput ModalitiesOutput Modalities
meta_llama/Llama-4-Scout-17B-16E-Instruct-FP8128k4028Text, ImageText
meta_llama/Llama-4-Maverick-17B-128E-Instruct-FP8128k4028Text, ImageText
meta_llama/Llama-3.3-70B-Instruct128k4028TextText
meta_llama/Llama-3.3-8B-Instruct128k4028TextText
Code
# Required
ANTHROPIC_API_KEY=sk-ant-...

# Optional
ANTHROPIC_API_BASE=<custom-base-url>
Example usage in your CrewAI project:
Code
llm = LLM(
    model="anthropic/claude-3-sonnet-20240229-v1:0",
    temperature=0.7
)
Set your API key in your .env file. If you need a key, or need to find an existing key, check AI Studio.
.env
# https://ai.google.dev/gemini-api/docs/api-key
GEMINI_API_KEY=<your-api-key>
Example usage in your CrewAI project:
Code
from crewai import LLM

llm = LLM(
    model="gemini/gemini-2.0-flash",
    temperature=0.7,
)

Gemini models

Google offers a range of powerful models optimized for different use cases.
ModelContext WindowBest For
gemini-2.5-flash-preview-04-171M tokensAdaptive thinking, cost efficiency
gemini-2.5-pro-preview-05-061M tokensEnhanced thinking and reasoning, multimodal understanding, advanced coding, and more
gemini-2.0-flash1M tokensNext generation features, speed, thinking, and realtime streaming
gemini-2.0-flash-lite1M tokensCost efficiency and low latency
gemini-1.5-flash1M tokensBalanced multimodal model, good for most tasks
gemini-1.5-flash-8B1M tokensFastest, most cost-efficient, good for high-frequency tasks
gemini-1.5-pro2M tokensBest performing, wide variety of reasoning tasks including logical reasoning, coding, and creative collaboration
The full list of models is available in the Gemini model docs.

Gemma

The Gemini API also allows you to use your API key to access Gemma models hosted on Google infrastructure.
ModelContext Window
gemma-3-1b-it32k tokens
gemma-3-4b-it32k tokens
gemma-3-12b-it32k tokens
gemma-3-27b-it128k tokens
Get credentials from your Google Cloud Console and save it to a JSON file, then load it with the following code:
Code
import json

file_path = 'path/to/vertex_ai_service_account.json'

# Load the JSON file
with open(file_path, 'r') as file:
    vertex_credentials = json.load(file)

# Convert the credentials to a JSON string
vertex_credentials_json = json.dumps(vertex_credentials)
Example usage in your CrewAI project:
Code
from crewai import LLM

llm = LLM(
    model="gemini-1.5-pro-latest", # or vertex_ai/gemini-1.5-pro-latest
    temperature=0.7,
    vertex_credentials=vertex_credentials_json
)
Google offers a range of powerful models optimized for different use cases:
ModelContext WindowBest For
gemini-2.5-flash-preview-04-171M tokensAdaptive thinking, cost efficiency
gemini-2.5-pro-preview-05-061M tokensEnhanced thinking and reasoning, multimodal understanding, advanced coding, and more
gemini-2.0-flash1M tokensNext generation features, speed, thinking, and realtime streaming
gemini-2.0-flash-lite1M tokensCost efficiency and low latency
gemini-1.5-flash1M tokensBalanced multimodal model, good for most tasks
gemini-1.5-flash-8B1M tokensFastest, most cost-efficient, good for high-frequency tasks
gemini-1.5-pro2M tokensBest performing, wide variety of reasoning tasks including logical reasoning, coding, and creative collaboration
Code
# Required
AZURE_API_KEY=<your-api-key>
AZURE_API_BASE=<your-resource-url>
AZURE_API_VERSION=<api-version>

# Optional
AZURE_AD_TOKEN=<your-azure-ad-token>
AZURE_API_TYPE=<your-azure-api-type>
Example usage in your CrewAI project:
Code
llm = LLM(
    model="azure/gpt-4",
    api_version="2023-05-15"
)
Code
AWS_ACCESS_KEY_ID=<your-access-key>
AWS_SECRET_ACCESS_KEY=<your-secret-key>
AWS_DEFAULT_REGION=<your-region>
Example usage in your CrewAI project:
Code
llm = LLM(
    model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0"
)
Before using Amazon Bedrock, make sure you have boto3 installed in your environmentAmazon Bedrock is a managed service that provides access to multiple foundation models from top AI companies through a unified API, enabling secure and responsible AI application development.
ModelContext WindowBest For
Amazon Nova ProUp to 300k tokensHigh-performance, model balancing accuracy, speed, and cost-effectiveness across diverse tasks.
Amazon Nova MicroUp to 128k tokensHigh-performance, cost-effective text-only model optimized for lowest latency responses.
Amazon Nova LiteUp to 300k tokensHigh-performance, affordable multimodal processing for images, video, and text with real-time capabilities.
Claude 3.7 SonnetUp to 128k tokensHigh-performance, best for complex reasoning, coding & AI agents
Claude 3.5 Sonnet v2Up to 200k tokensState-of-the-art model specialized in software engineering, agentic capabilities, and computer interaction at optimized cost.
Claude 3.5 SonnetUp to 200k tokensHigh-performance model delivering superior intelligence and reasoning across diverse tasks with optimal speed-cost balance.
Claude 3.5 HaikuUp to 200k tokensFast, compact multimodal model optimized for quick responses and seamless human-like interactions
Claude 3 SonnetUp to 200k tokensMultimodal model balancing intelligence and speed for high-volume deployments.
Claude 3 HaikuUp to 200k tokensCompact, high-speed multimodal model optimized for quick responses and natural conversational interactions
Claude 3 OpusUp to 200k tokensMost advanced multimodal model exceling at complex tasks with human-like reasoning and superior contextual understanding.
Claude 2.1Up to 200k tokensEnhanced version with expanded context window, improved reliability, and reduced hallucinations for long-form and RAG applications
ClaudeUp to 100k tokensVersatile model excelling in sophisticated dialogue, creative content, and precise instruction following.
Claude InstantUp to 100k tokensFast, cost-effective model for everyday tasks like dialogue, analysis, summarization, and document Q&A
Llama 3.1 405B InstructUp to 128k tokensAdvanced LLM for synthetic data generation, distillation, and inference for chatbots, coding, and domain-specific tasks.
Llama 3.1 70B InstructUp to 128k tokensPowers complex conversations with superior contextual understanding, reasoning and text generation.
Llama 3.1 8B InstructUp to 128k tokensAdvanced state-of-the-art model with language understanding, superior reasoning, and text generation.
Llama 3 70B InstructUp to 8k tokensPowers complex conversations with superior contextual understanding, reasoning and text generation.
Llama 3 8B InstructUp to 8k tokensAdvanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.
Titan Text G1 - LiteUp to 4k tokensLightweight, cost-effective model optimized for English tasks and fine-tuning with focus on summarization and content generation.
Titan Text G1 - ExpressUp to 8k tokensVersatile model for general language tasks, chat, and RAG applications with support for English and 100+ languages.
Cohere CommandUp to 4k tokensModel specialized in following user commands and delivering practical enterprise solutions.
Jurassic-2 MidUp to 8,191 tokensCost-effective model balancing quality and affordability for diverse language tasks like Q&A, summarization, and content generation.
Jurassic-2 UltraUp to 8,191 tokensModel for advanced text generation and comprehension, excelling in complex tasks like analysis and content creation.
Jamba-InstructUp to 256k tokensModel with extended context window optimized for cost-effective text generation, summarization, and Q&A.
Mistral 7B InstructUp to 32k tokensThis LLM follows instructions, completes requests, and generates creative text.
Mistral 8x7B InstructUp to 32k tokensAn MOE LLM that follows instructions, completes requests, and generates creative text.
Code
AWS_ACCESS_KEY_ID=<your-access-key>
AWS_SECRET_ACCESS_KEY=<your-secret-key>
AWS_DEFAULT_REGION=<your-region>
Example usage in your CrewAI project:
Code
llm = LLM(
    model="sagemaker/<my-endpoint>"
)
Set the following environment variables in your .env file:
Code
MISTRAL_API_KEY=<your-api-key>
Example usage in your CrewAI project:
Code
llm = LLM(
    model="mistral/mistral-large-latest",
    temperature=0.7
)
Set the following environment variables in your .env file:
Code
NVIDIA_API_KEY=<your-api-key>
Example usage in your CrewAI project:
Code
llm = LLM(
    model="nvidia_nim/meta/llama3-70b-instruct",
    temperature=0.7
)
Nvidia NIM provides a comprehensive suite of models for various use cases, from general-purpose tasks to specialized applications.
ModelContext WindowBest For
nvidia/mistral-nemo-minitron-8b-8k-instruct8,192 tokensState-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation.
nvidia/nemotron-4-mini-hindi-4b-instruct4,096 tokensA bilingual Hindi-English SLM for on-device inference, tailored specifically for Hindi Language.
nvidia/llama-3.1-nemotron-70b-instruct128k tokensCustomized for enhanced helpfulness in responses
nvidia/llama3-chatqa-1.5-8b128k tokensAdvanced LLM to generate high-quality, context-aware responses for chatbots and search engines.
nvidia/llama3-chatqa-1.5-70b128k tokensAdvanced LLM to generate high-quality, context-aware responses for chatbots and search engines.
nvidia/vila128k tokensMulti-modal vision-language model that understands text/img/video and creates informative responses
nvidia/neva-224,096 tokensMulti-modal vision-language model that understands text/images and generates informative responses
nvidia/nemotron-mini-4b-instruct8,192 tokensGeneral-purpose tasks
nvidia/usdcode-llama3-70b-instruct128k tokensState-of-the-art LLM that answers OpenUSD knowledge queries and generates USD-Python code.
nvidia/nemotron-4-340b-instruct4,096 tokensCreates diverse synthetic data that mimics the characteristics of real-world data.
meta/codellama-70b100k tokensLLM capable of generating code from natural language and vice versa.
meta/llama2-70b4,096 tokensCutting-edge large language AI model capable of generating text and code in response to prompts.
meta/llama3-8b-instruct8,192 tokensAdvanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.
meta/llama3-70b-instruct8,192 tokensPowers complex conversations with superior contextual understanding, reasoning and text generation.
meta/llama-3.1-8b-instruct128k tokensAdvanced state-of-the-art model with language understanding, superior reasoning, and text generation.
meta/llama-3.1-70b-instruct128k tokensPowers complex conversations with superior contextual understanding, reasoning and text generation.
meta/llama-3.1-405b-instruct128k tokensAdvanced LLM for synthetic data generation, distillation, and inference for chatbots, coding, and domain-specific tasks.
meta/llama-3.2-1b-instruct128k tokensAdvanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
meta/llama-3.2-3b-instruct128k tokensAdvanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
meta/llama-3.2-11b-vision-instruct128k tokensAdvanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
meta/llama-3.2-90b-vision-instruct128k tokensAdvanced state-of-the-art small language model with language understanding, superior reasoning, and text generation.
google/gemma-7b8,192 tokensCutting-edge text generation model text understanding, transformation, and code generation.
google/gemma-2b8,192 tokensCutting-edge text generation model text understanding, transformation, and code generation.
google/codegemma-7b8,192 tokensCutting-edge model built on Google’s Gemma-7B specialized for code generation and code completion.
google/codegemma-1.1-7b8,192 tokensAdvanced programming model for code generation, completion, reasoning, and instruction following.
google/recurrentgemma-2b8,192 tokensNovel recurrent architecture based language model for faster inference when generating long sequences.
google/gemma-2-9b-it8,192 tokensCutting-edge text generation model text understanding, transformation, and code generation.
google/gemma-2-27b-it8,192 tokensCutting-edge text generation model text understanding, transformation, and code generation.
google/gemma-2-2b-it8,192 tokensCutting-edge text generation model text understanding, transformation, and code generation.
google/deplot512 tokensOne-shot visual language understanding model that translates images of plots into tables.
google/paligemma8,192 tokensVision language model adept at comprehending text and visual inputs to produce informative responses.
mistralai/mistral-7b-instruct-v0.232k tokensThis LLM follows instructions, completes requests, and generates creative text.
mistralai/mixtral-8x7b-instruct-v0.18,192 tokensAn MOE LLM that follows instructions, completes requests, and generates creative text.
mistralai/mistral-large4,096 tokensCreates diverse synthetic data that mimics the characteristics of real-world data.
mistralai/mixtral-8x22b-instruct-v0.18,192 tokensCreates diverse synthetic data that mimics the characteristics of real-world data.
mistralai/mistral-7b-instruct-v0.332k tokensThis LLM follows instructions, completes requests, and generates creative text.
nv-mistralai/mistral-nemo-12b-instruct128k tokensMost advanced language model for reasoning, code, multilingual tasks; runs on a single GPU.
mistralai/mamba-codestral-7b-v0.1256k tokensModel for writing and interacting with code across a wide range of programming languages and tasks.
microsoft/phi-3-mini-128k-instruct128K tokensLightweight, state-of-the-art open LLM with strong math and logical reasoning skills.
microsoft/phi-3-mini-4k-instruct4,096 tokensLightweight, state-of-the-art open LLM with strong math and logical reasoning skills.
microsoft/phi-3-small-8k-instruct8,192 tokensLightweight, state-of-the-art open LLM with strong math and logical reasoning skills.
microsoft/phi-3-small-128k-instruct128K tokensLightweight, state-of-the-art open LLM with strong math and logical reasoning skills.
microsoft/phi-3-medium-4k-instruct4,096 tokensLightweight, state-of-the-art open LLM with strong math and logical reasoning skills.
microsoft/phi-3-medium-128k-instruct128K tokensLightweight, state-of-the-art open LLM with strong math and logical reasoning skills.
microsoft/phi-3.5-mini-instruct128K tokensLightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments
microsoft/phi-3.5-moe-instruct128K tokensAdvanced LLM based on Mixture of Experts architecture to deliver compute efficient content generation
microsoft/kosmos-21,024 tokensGroundbreaking multimodal model designed to understand and reason about visual elements in images.
microsoft/phi-3-vision-128k-instruct128k tokensCutting-edge open multimodal model exceling in high-quality reasoning from images.
microsoft/phi-3.5-vision-instruct128k tokensCutting-edge open multimodal model exceling in high-quality reasoning from images.
databricks/dbrx-instruct12k tokensA general-purpose LLM with state-of-the-art performance in language understanding, coding, and RAG.
snowflake/arctic1,024 tokensDelivers high efficiency inference for enterprise applications focused on SQL generation and coding.
aisingapore/sea-lion-7b-instruct4,096 tokensLLM to represent and serve the linguistic and cultural diversity of Southeast Asia
ibm/granite-8b-code-instruct4,096 tokensSoftware programming LLM for code generation, completion, explanation, and multi-turn conversion.
ibm/granite-34b-code-instruct8,192 tokensSoftware programming LLM for code generation, completion, explanation, and multi-turn conversion.
ibm/granite-3.0-8b-instruct4,096 tokensAdvanced Small Language Model supporting RAG, summarization, classification, code, and agentic AI
ibm/granite-3.0-3b-a800m-instruct4,096 tokensHighly efficient Mixture of Experts model for RAG, summarization, entity extraction, and classification
mediatek/breeze-7b-instruct4,096 tokensCreates diverse synthetic data that mimics the characteristics of real-world data.
upstage/solar-10.7b-instruct4,096 tokensExcels in NLP tasks, particularly in instruction-following, reasoning, and mathematics.
writer/palmyra-med-70b-32k32k tokensLeading LLM for accurate, contextually relevant responses in the medical domain.
writer/palmyra-med-70b32k tokensLeading LLM for accurate, contextually relevant responses in the medical domain.
writer/palmyra-fin-70b-32k32k tokensSpecialized LLM for financial analysis, reporting, and data processing
01-ai/yi-large32k tokensPowerful model trained on English and Chinese for diverse tasks including chatbot and creative writing.
deepseek-ai/deepseek-coder-6.7b-instruct2k tokensPowerful coding model offering advanced capabilities in code generation, completion, and infilling
rakuten/rakutenai-7b-instruct1,024 tokensAdvanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.
rakuten/rakutenai-7b-chat1,024 tokensAdvanced state-of-the-art LLM with language understanding, superior reasoning, and text generation.
baichuan-inc/baichuan2-13b-chat4,096 tokensSupport Chinese and English chat, coding, math, instruction following, solving quizzes
NVIDIA NIM enables you to run powerful LLMs locally on your Windows machine using WSL2 (Windows Subsystem for Linux). This approach allows you to leverage your NVIDIA GPU for private, secure, and cost-effective AI inference without relying on cloud services. Perfect for development, testing, or production scenarios where data privacy or offline capabilities are required.Here is a step-by-step guide to setting up a local NVIDIA NIM model:
  1. Follow installation instructions from NVIDIA Website
  2. Install the local model. For Llama 3.1-8b follow instructions
  3. Configure your crewai local models:
Code
from crewai.llm import LLM

local_nvidia_nim_llm = LLM(
    model="openai/meta/llama-3.1-8b-instruct", # it's an openai-api compatible model
    base_url="http://localhost:8000/v1",
    api_key="<your_api_key|any text if you have not configured it>", # api_key is required, but you can use any text
)

# Then you can use it in your crew:

@CrewBase
class MyCrew():
    # ...

    @agent
    def researcher(self) -> Agent:
        return Agent(
            config=self.agents_config['researcher'], # type: ignore[index]
            llm=local_nvidia_nim_llm
        )

    # ...
Set the following environment variables in your .env file:
Code
GROQ_API_KEY=<your-api-key>
Example usage in your CrewAI project:
Code
llm = LLM(
    model="groq/llama-3.2-90b-text-preview",
    temperature=0.7
)
ModelContext WindowBest For
Llama 3.1 70B/8B131,072 tokensHigh-performance, large context tasks
Llama 3.2 Series8,192 tokensGeneral-purpose tasks
Mixtral 8x7B32,768 tokensBalanced performance and context
Set the following environment variables in your .env file:
Code
# Required
WATSONX_URL=<your-url>
WATSONX_APIKEY=<your-apikey>
WATSONX_PROJECT_ID=<your-project-id>

# Optional
WATSONX_TOKEN=<your-token>
WATSONX_DEPLOYMENT_SPACE_ID=<your-space-id>
Example usage in your CrewAI project:
Code
llm = LLM(
    model="watsonx/meta-llama/llama-3-1-70b-instruct",
    base_url="https://api.watsonx.ai/v1"
)
  1. Install Ollama: ollama.ai
  2. Run a model: ollama run llama3
  3. Configure:
Code
llm = LLM(
    model="ollama/llama3:70b",
    base_url="http://localhost:11434"
)
Set the following environment variables in your .env file:
Code
FIREWORKS_API_KEY=<your-api-key>
Example usage in your CrewAI project:
Code
llm = LLM(
    model="fireworks_ai/accounts/fireworks/models/llama-v3-70b-instruct",
    temperature=0.7
)
Set the following environment variables in your .env file:
Code
PERPLEXITY_API_KEY=<your-api-key>
Example usage in your CrewAI project:
Code
llm = LLM(
    model="llama-3.1-sonar-large-128k-online",
    base_url="https://api.perplexity.ai/"
)
Set the following environment variables in your .env file:
Code
HF_TOKEN=<your-api-key>
Example usage in your CrewAI project:
Code
llm = LLM(
    model="huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct"
)
Set the following environment variables in your .env file:
Code
SAMBANOVA_API_KEY=<your-api-key>
Example usage in your CrewAI project:
Code
llm = LLM(
    model="sambanova/Meta-Llama-3.1-8B-Instruct",
    temperature=0.7
)
ModelContext WindowBest For
Llama 3.1 70B/8BUp to 131,072 tokensHigh-performance, large context tasks
Llama 3.1 405B8,192 tokensHigh-performance and output quality
Llama 3.2 Series8,192 tokensGeneral-purpose, multimodal tasks
Llama 3.3 70BUp to 131,072 tokensHigh-performance and output quality
Qwen2 familly8,192 tokensHigh-performance and output quality
Set the following environment variables in your .env file:
Code
# Required
CEREBRAS_API_KEY=<your-api-key>
Example usage in your CrewAI project:
Code
llm = LLM(
    model="cerebras/llama3.1-70b",
    temperature=0.7,
    max_tokens=8192
)
Cerebras features:
  • Fast inference speeds
  • Competitive pricing
  • Good balance of speed and quality
  • Support for long context windows
Set the following environment variables in your .env file:
Code
OPENROUTER_API_KEY=<your-api-key>
Example usage in your CrewAI project:
Code
llm = LLM(
    model="openrouter/deepseek/deepseek-r1",
    base_url="https://openrouter.ai/api/v1",
    api_key=OPENROUTER_API_KEY
)
Open Router models:
  • openrouter/deepseek/deepseek-r1
  • openrouter/deepseek/deepseek-chat
Set the following environment variables in your .env file:
Code
NEBIUS_API_KEY=<your-api-key>
Example usage in your CrewAI project:
Code
llm = LLM(
    model="nebius/Qwen/Qwen3-30B-A3B"
)
Nebius AI Studio features:
  • Large collection of open source models
  • Higher rate limits
  • Competitive pricing
  • Good balance of speed and quality

Streaming Responses

CrewAI supports streaming responses from LLMs, allowing your application to receive and process outputs in real-time as they’re generated.
  • Basic Setup
  • Event Handling
  • Agent & Task Tracking
Enable streaming by setting the stream parameter to True when initializing your LLM:
from crewai import LLM

# Create an LLM with streaming enabled
llm = LLM(
    model="openai/gpt-4o",
    stream=True  # Enable streaming
)
When streaming is enabled, responses are delivered in chunks as they’re generated, creating a more responsive user experience.

Structured LLM Calls

CrewAI supports structured responses from LLM calls by allowing you to define a response_format using a Pydantic model. This enables the framework to automatically parse and validate the output, making it easier to integrate the response into your application without manual post-processing. For example, you can define a Pydantic model to represent the expected response structure and pass it as the response_format when instantiating the LLM. The model will then be used to convert the LLM output into a structured Python object.
Code
from crewai import LLM

class Dog(BaseModel):
    name: str
    age: int
    breed: str


llm = LLM(model="gpt-4o", response_format=Dog)

response = llm.call(
    "Analyze the following messages and return the name, age, and breed. "
    "Meet Kona! She is 3 years old and is a black german shepherd."
)
print(response)

# Output:
# Dog(name='Kona', age=3, breed='black german shepherd')

Advanced Features and Optimization

Learn how to get the most out of your LLM configuration:
CrewAI includes smart context management features:
from crewai import LLM

# CrewAI automatically handles:
# 1. Token counting and tracking
# 2. Content summarization when needed
# 3. Task splitting for large contexts

llm = LLM(
    model="gpt-4",
    max_tokens=4000,  # Limit response length
)
Best practices for context management:
  1. Choose models with appropriate context windows
  2. Pre-process long inputs when possible
  3. Use chunking for large documents
  4. Monitor token usage to optimize costs
1

Token Usage Optimization

Choose the right context window for your task:
  • Small tasks (up to 4K tokens): Standard models
  • Medium tasks (between 4K-32K): Enhanced models
  • Large tasks (over 32K): Large context models
# Configure model with appropriate settings
llm = LLM(
    model="openai/gpt-4-turbo-preview",
    temperature=0.7,    # Adjust based on task
    max_tokens=4096,    # Set based on output needs
    timeout=300        # Longer timeout for complex tasks
)
  • Lower temperature (0.1 to 0.3) for factual responses
  • Higher temperature (0.7 to 0.9) for creative tasks
2

Best Practices

  1. Monitor token usage
  2. Implement rate limiting
  3. Use caching when possible
  4. Set appropriate max_tokens limits
Remember to regularly monitor your token usage and adjust your configuration as needed to optimize costs and performance.
CrewAI internally uses Litellm for LLM calls, which allows you to drop additional parameters that are not needed for your specific use case. This can help simplify your code and reduce the complexity of your LLM configuration. For example, if you don’t need to send the stop parameter, you can simply omit it from your LLM call:
from crewai import LLM
import os

os.environ["OPENAI_API_KEY"] = "<api-key>"

o3_llm = LLM(
    model="o3",
    drop_params=True,
    additional_drop_params=["stop"]
)

Common Issues and Solutions

  • Authentication
  • Model Names
  • Context Length
Most authentication issues can be resolved by checking API key format and environment variable names.
# OpenAI
OPENAI_API_KEY=sk-...

# Anthropic
ANTHROPIC_API_KEY=sk-ant-...
I