Usage Guide¶

This guide covers the complete workflow of using the Tiiny SDK, from device connection to advanced AI tasks.

1. Device Connection & Authentication¶

Before using models, you need to connect to the Tiiny device and complete authentication with the device password.

from tiiny import TiinyDevice

# Initialize device with IPv6 address
DEVICE_IP = "fd80:7:7:7::1"
device = TiinyDevice(device_ip=DEVICE_IP)

# Check connection
if device.is_connected():
    print("Device connected!")

# Authenticate to get API Key
MASTER_PASSWORD = "your_master_password"
api_key = device.get_api_key(master_password=MASTER_PASSWORD)

2. System Monitoring¶

You can monitor the status of the Neural Processing Units (NPUs) on the device.

npu_status_list = device.get_npu_status(api_key=api_key)

for npu in npu_status_list:
    print(f"NPU: {npu['name']}")
    print(f"Memory Usage: {npu['memory_usage_percent']}%")

3. Model Management¶

List Models¶

Retrieve the list of models available on the device.

models = device.get_models(api_key=api_key)
print("Available models:", models)

Start a Model¶

Before using a model, it must be loaded into memory.

target_model = "Qwen/Qwen3-30B-A3B-Instruct-2507"
success = device.start_model(target_model, api_key=api_key)

if success:
    print(f"Model {target_model} started successfully!")

Stop a Model¶

You can stop a running model to free up resources.

success = device.stop_model(target_model, api_key=api_key)

if success:
    print(f"Model {target_model} stopped successfully!")

4. Text generation¶

The SDK provides an OpenAI-compatible interface for chat completions. You can use a large language model to generate text from a prompt. Models can generate almost any kind of text response—like code, mathematical equations, structured JSON data, or human-like prose.
Here's a simple example using the Chat Completions API.

from tiiny import OpenAI

client = OpenAI(api_key=api_key, base_url=device.get_url())

# Non-streaming
response = client.chat.completions.create(
    model=target_model,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a joke."}
    ]
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model=target_model,
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True
)

for chunk in stream:
    if chunk['choices']:
        print(chunk['choices'][0]['delta'].get('content', ''), end='', flush=True)

5. Embeddings¶

Embeddings can be used to measure the relatedness of text strings. They are commonly used for: Search, Clustering, Recommendations, Anomaly detection, Diversity measurement,Classification.
An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.

embedding_model = "Qwen/Qwen3-Embedding-0.6B"
device.start_model(embedding_model, api_key=api_key)

response = client.embeddings.create(
    model=embedding_model,
    input="The quick brown fox jumps over the lazy dog."
)

vector = response.get("data", [])[0].get("embedding")
print(f"Vector length: {len(vector)}")

6. Rerank¶

A rerank model is a neural network that refines search or retrieval results by reassessing the relevance of initially retrieved candidates.
It analyzes the semantic relationship between a query and each candidate document, assigning a finer relevance score to promote the most pertinent items to the top of the list, thereby significantly improving final result quality.

rerank_model = "Qwen/Qwen3-Reranker-0.6B"
device.start_model(rerank_model, api_key=api_key)

query = "Apple"
documents = ["A delicious fruit", "A tech company", "A color"]

result = client.rerank.create(
    model=rerank_model,
    query=query,
    documents=documents,
    top_k=2
)

for res in result.get("results", []):
    print(f"Doc: {documents[res['index']]}, Score: {res['relevance_score']}")

7. Image Generation¶

You can choose a Text-to-Image model to generate images.

import base64
import time

image_model = "Tongyi-MAI/Z-Image-Turbo"
device.start_model(image_model, api_key=api_key)

prompt = "A futuristic city skyline at sunset"

stream = client.images.generate(
    model=image_model,
    prompt=prompt,
    steps=20,
    width=512,
    height=512
)

for chunk in stream:
    if "progress" in chunk:
        print(f"Progress: {chunk['progress']}%")

    if "image" in chunk:
        # Handle the base64 image data
        image_val = chunk.get("image")

        # Ensure we handle both string and list formats
        image_base64 = image_val[0] if isinstance(image_val, list) else image_val

        # Remove data URI prefix if present
        if "," in image_base64:
            image_base64 = image_base64.split(",", 1)[1]

        # Decode and save
        img_data = base64.b64decode(image_base64)
        with open("output.png", "wb") as f:
            f.write(img_data)
        print("Image saved!")

8. Audio Transcription (ASR)¶

This API provides an audio transcription service with full feature parity to OpenAI's Whisper API.

# Simple transcription (returns JSON object with text)
result = client.audio.transcriptions.create(
    model="FunAudioLLM/SenseVoiceSmall",
    file="path/to/audio_file.wav"  # Can be a path string or file-like object
)
print(f"Transcription: {result['text']}")

# Request specific response format (text, srt, vtt, verbose_json, json)
srt_content = client.audio.transcriptions.create(
    model="FunAudioLLM/SenseVoiceSmall",
    file="path/to/audio_file.mp3",
    response_format="srt",
    language="en"  # Optional language hint
)
print(srt_content)