Deploying Models on DKubeX Using Model Studio¶

This page combines the main DKubeX Model Studio deployment flows for both Large Language Models (LLMs) and machine learning (ML) models.

Model Studio supports discovering models from HuggingFace, deploying them through a guided form, and validating them in Playground.

Shared Deployment Lifecycle¶

Both LLM and ML deployments follow the same platform lifecycle:

Pending -> Downloading -> Starting -> Running

Use the Deployed Models page to monitor status, edit resources, scale replicas, and manage model scope (Private/Shared).

Use this flow for text-generation and instruction-following models, such as Qwen/Llama-class chat models.

Open Model Studio and go to Catalog.
Set Task to Text Generation.
Search for your model, such as a Qwen2-family model.
Click Deploy on the model card.
In the deploy form, configure:
- Quantization, for example Q4_K_M for a smaller footprint
- Resource Profile, such as a CPU or GPU preset
- Replicas, usually starting with 1
- Scope, either Private or Shared
Submit the deployment.
Track status in Deployed Models or Dashboard until the model reaches Running.

Model Size	Suggested Profile	Replicas	Notes
Small (1B to 3B)	CPU profile	1	Good for functional testing
Medium (7B to 8B)	GPU profile	1	Better latency and quality
Larger (13B+)	Larger GPU profile	1	Validate memory headroom before scaling

Stuck in Downloading: verify outbound registry/network access and image pull status.
Stuck in Starting: check resource profile capacity and pod scheduling.
Failed: review model runtime logs and deployment events in the cluster.

Use this flow for task-oriented ML inference models where the output is not long-form chat generation.

Open Model Studio and go to Catalog.
Choose the relevant Task filter:
- Embeddings
- Reranking
- Speech Recognition
Pick a model and click Deploy.
In the deploy form, set:
- Precision or quantization, as applicable
- Resource Profile
- Replicas
- Scope, either Private or Shared
Submit the deployment.
Wait for status to reach Running.

Use Open Playground from Deployed Models and validate according to task type:

Area	LLM Flow	ML Flow
Primary task	Text generation and chat	Embeddings, reranking, and ASR
Primary validation	Chat response quality	Task-specific functional output
Typical optimization	Prompt latency and context handling	Throughput, ranking, and embedding quality

Wrong output type in Playground: confirm the deployed model task matches the selected tab.
No model in selector: ensure model status is Running and the tab supports that feature.
Poor latency: increase profile resources or reduce replica contention.