HuggingFace | Marketplace

OVERVIEW

HuggingFace's Inference API lets you run any model hosted on the Hub without managing infrastructure. Send a POST request with your input, get predictions back. Supports text generation, classification, embeddings, image generation, object detection, translation, summarization, and more.

The same endpoint pattern works for every model. Just change the model ID in the URL. Free tier includes rate-limited access to popular models. For production, Inference Endpoints provide dedicated GPUs.

GET YOUR API KEY

Authentication setup

Go to huggingface.co and sign in
Click your avatar → Settings
Navigate to Access Tokens
Click New token, select Read or Write scope
Copy the token. It starts with hf_

Token format hf_...

Auth type bearer

QUICK CONNECT

Create a connection in one request.

CONNECT HUGGINGFACE

POST /connections Authorization: Bearer $TOKEN Content-Type: application/json { "name": "HuggingFace", "base_url": "https://api-inference.huggingface.co", "auth_type": "bearer", "auth_config": {"token": "hf_..."} } → {"id": "cn_xxxxxxxx", "name": "HuggingFace"}

KEY ENDPOINTS

What you can call via proxy.

Method	Path	Description
POST	`/models/{model_id}`	Run inference on any model
POST	`/models/meta-llama/Llama-3-8b-instruct`	Text generation with Llama 3
POST	`/models/sentence-transformers/all-MiniLM-L6-v2`	Get text embeddings
POST	`/models/facebook/bart-large-mnli`	Zero-shot text classification
POST	`/models/openai/whisper-large-v3`	Transcribe audio
POST	`/models/stabilityai/stable-diffusion-xl-base-1.0`	Generate images

EXAMPLE

Text Generation

PROXY CALL

POST /proxy/cn_xxxxxxxx/models/meta-llama/Llama-3-8b-instruct Authorization: Bearer $TOKEN Content-Type: application/json { "inputs": "The best thing about API proxies is", "parameters": {"max_new_tokens": 50} } [{ "generated_text": "The best thing about API proxies is that they decouple your application from the upstream service, letting you swap providers, add caching, or inject credentials without changing client code." }]

USE CASES

What people build with HuggingFace.

Model Prototyping

Test any of 400K+ models with a single API call before committing to infrastructure.

NLP Pipelines

Chain classification, summarization, and translation models for document processing.

Embedding Generation

Generate embeddings with sentence-transformers for search, clustering, and RAG.

Image Generation

Run Stable Diffusion and other image models for creative and design workflows.

# HuggingFace > Run inference on 400,000+ open models. Text, image, audio, and more. ## Overview HuggingFace's Inference API lets you run any model hosted on the Hub without managing infrastructure. Send a POST request with your input, get predictions back. Supports text generation, classification, embeddings, image generation, object detection, translation, summarization, and more. The same endpoint pattern works for every model. Just change the model ID in the URL. Free tier includes rate-limited access to popular models. For production, Inference Endpoints provide dedicated GPUs. - **Base URL:** https://api-inference.huggingface.co - **Auth type:** bearer - **Token format:** hf_... ## Get Your API Key 1. Go to huggingface.co and sign in 2. Click your avatar → Settings 3. Navigate to Access Tokens 4. Click New token, select Read or Write scope 5. Copy the token. It starts with hf_ ## Quick Connect bashcurl -X POST https://api.liteio.dev/connections \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{ "name": "HuggingFace", "base_url": "https://api-inference.huggingface.co", "auth_type": "bearer", "auth_config": {"token": "hf_..."} }' ## Key Endpoints MethodPathDescription POST/models/{model_id}Run inference on any model POST/models/meta-llama/Llama-3-8b-instructText generation with Llama 3 POST/models/sentence-transformers/all-MiniLM-L6-v2Get text embeddings POST/models/facebook/bart-large-mnliZero-shot text classification POST/models/openai/whisper-large-v3Transcribe audio POST/models/stabilityai/stable-diffusion-xl-base-1.0Generate images ## Example: Text Generation bashcurl -X POST https://api.liteio.dev/proxy/cn_xxx/models/meta-llama/Llama-3-8b-instruct \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{ "inputs": "The best thing about API proxies is", "parameters": {"max_new_tokens": 50} }' ## Use Cases - **Model Prototyping**: Test any of 400K+ models with a single API call before committing to infrastructure. - **NLP Pipelines**: Chain classification, summarization, and translation models for document processing. - **Embedding Generation**: Generate embeddings with sentence-transformers for search, clustering, and RAG. - **Image Generation**: Run Stable Diffusion and other image models for creative and design workflows. ## Links - Back to Marketplace - Developer Guide - Get API Key