İçeriğe geç

Dedicated GPU

Dedicated GPU; sabit kapasite, gated model, özel quantization veya öngörülebilir throughput gerektiğinde kullanılır.

Terminal window
curl https://api.parel.cloud/v1/deployments \
-H "Authorization: Bearer pk-dev-YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "my-llama",
"huggingface_id": "meta-llama/Llama-3-8B-Instruct",
"gpu_tier": "NVIDIA A40",
"quantization": "awq",
"idle_timeout_minutes": 15,
"budget_limit_usd": 50.0,
"max_model_len": 8192
}'
EndpointAçıklama
GET /v1/gpu-tiersCache’li GPU fiyatları
GET /v1/gpu-tiers/liveGüncel GPU fiyatları
GET /v1/deployment-templatesHazır deployment şablonları
GET /v1/deployments/previewUyum, süre ve saatlik maliyet tahmini
POST /v1/hf/validateHugging Face modeli doğrula
POST /v1/deploymentsDeployment oluştur
GET /v1/deploymentsDeployment listesi
GET /v1/deployments/{id}Deployment detayı
POST /v1/deployments/{id}/startBaşlat veya uyandır
POST /v1/deployments/{id}/stopDurdur
DELETE /v1/deployments/{id}Kalıcı sonlandır
GET /v1/deployments/{id}/eventsDurum geçişleri
GET /v1/deployments/{id}/metricsThroughput ve gecikme
GET /v1/deployments/{id}/billingSaatlik yakım
POST /v1/deployments/{id}/chatDeployment’a özel chat
response = client.chat.completions.create(
model="byom-DEPLOYMENT_ID",
messages=[{"role": "user", "content": "Merhaba"}],
)