Run inference on 400,000+ open models. Text, image, audio, and more.
https://api-inference.huggingface.co
HuggingFace's Inference API lets you run any model hosted on the Hub without managing infrastructure. Send a POST request with your input, get predictions back. Supports text generation, classification, embeddings, image generation, object detection, translation, summarization, and more.
The same endpoint pattern works for every model. Just change the model ID in the URL. Free tier includes rate-limited access to popular models. For production, Inference Endpoints provide dedicated GPUs.
hf_| Method | Path | Description |
|---|---|---|
| POST | /models/{model_id} |
Run inference on any model |
| POST | /models/meta-llama/Llama-3-8b-instruct |
Text generation with Llama 3 |
| POST | /models/sentence-transformers/all-MiniLM-L6-v2 |
Get text embeddings |
| POST | /models/facebook/bart-large-mnli |
Zero-shot text classification |
| POST | /models/openai/whisper-large-v3 |
Transcribe audio |
| POST | /models/stabilityai/stable-diffusion-xl-base-1.0 |
Generate images |
Test any of 400K+ models with a single API call before committing to infrastructure.
Chain classification, summarization, and translation models for document processing.
Generate embeddings with sentence-transformers for search, clustering, and RAG.
Run Stable Diffusion and other image models for creative and design workflows.