Skip to content

Usage Guide

This guide covers the complete workflow of using the Tiiny SDK, from device connection to advanced AI tasks.

Command Line Interface (CLI)

The tiiny command is the easiest way to interact with your device. After pip install tiiny-sdk, you can use it directly in the terminal.

Authentication

First, login to your device to save credentials. This avoids passing the IP and password for every command.

tiiny login

This will prompt for: - Device IP: default is fd80:7:7:7::1 - Master Password: your device password

Credentials are saved to ~/.tiiny/config.json. You can manage them with: - tiiny config: View current configuration - tiiny logout: Remove saved credentials

Chat (Interactive)

To start a model and chat immediately:

tiiny run Qwen/Qwen3-30B-A3B-Instruct-2507

Inside the chat: - Type your prompt and press Enter for streaming response. - Use /clear to clear conversation history. - Use /exit or /quit to leave.

Benchmark

Measure the performance of different prompt sizes:

tiiny bench Qwen/Qwen3-30B-A3B-Instruct-2507 --token-sizes 256 1024 2048

This reports: - Prefill Speed: Tokens per second for input processing. - Decoding Speed: Tokens per second for text generation. - Total Time: End-to-end latency.

Models Management

# List all available models on the device
tiiny models

# List models currently running and ready for API calls
tiiny models --running

Python SDK

For complex applications, use the Python SDK to integrate Tiiny into your code.

1. Device Connection & Authentication

Before using models, you need to connect to the Tiiny device and complete authentication.

from tiiny import TiinyDevice

# Initialize device with IPv6 address
DEVICE_IP = "fd80:7:7:7::1"
device = TiinyDevice(device_ip=DEVICE_IP)

# Check connection
if device.is_connected():
    print("Device connected!")

# Authenticate to get API Key
MASTER_PASSWORD = "your_master_password"
api_key = device.get_api_key(master_password=MASTER_PASSWORD)

2. System Monitoring

Monitor the status of the Neural Processing Units (NPUs) on the device.

npu_status_list = device.get_npu_status(api_key=api_key)

for npu in npu_status_list:
    print(f"NPU: {npu['name']}, Memory: {npu['memory_usage_percent']}%")

3. Model Management

List Models

models = device.get_models(api_key=api_key)
print("Available models:", models)

Start/Stop a Model

Models must be loaded into memory before use.

target_model = "Qwen/Qwen3-30B-A3B-Instruct-2507"

# Start the model
if device.start_model(target_model, api_key=api_key):
    print(f"Model {target_model} started!")

# Stop the model to free memory
device.stop_model(target_model, api_key=api_key)

4. Text Generation (OpenAI-Compatible)

The SDK provides an OpenAI-compatible interface for chat completions.

from tiiny import OpenAI

client = OpenAI(api_key=api_key, base_url=device.get_url())

# Streaming response
stream = client.chat.completions.create(
    model=target_model,
    messages=[{"role": "user", "content": "Tell me a story."}],
    stream=True
)

for chunk in stream:
    if chunk['choices']:
        print(chunk['choices'][0]['delta'].get('content', ''), end='', flush=True)

5. Embeddings, Rerank, and Multi-modal APIs

Embeddings

embedding_model = "Qwen/Qwen3-Embedding-0.6B"
device.start_model(embedding_model, api_key=api_key)

response = client.embeddings.create(
    model=embedding_model,
    input="The quick brown fox jumps over the lazy dog."
)
vector = response.get("data", [])[0].get("embedding")

Rerank

rerank_model = "Qwen/Qwen3-Reranker-0.6B"
device.start_model(rerank_model, api_key=api_key)

result = client.rerank.create(
    model=rerank_model,
    query="Apple",
    documents=["A delicious fruit", "A tech company"],
    top_k=2
)

Image Generation

image_model = "Tongyi-MAI/Z-Image-Turbo"
device.start_model(image_model, api_key=api_key)

stream = client.images.generate(model=image_model, prompt="A sunset")
# ... handle output.png as shown in detailed examples

Audio (ASR)

result = client.audio.transcriptions.create(
    model="FunAudioLLM/SenseVoiceSmall",
    file="path/to/audio.wav"
)
print(f"Transcription: {result['text']}")

6. Speaker / Mic Audio Streams

RTMP playback control and live PCM recording (requires SDK 0.1.27+).

device.start_audio_stream("ss1")

# Record 10s of raw PCM
import time
start_time = time.time()
with open("recording.pcm", "wb") as f:
    for chunk in device.iter_audio_stream("ss1"):
        f.write(chunk)
        if time.time() - start_time >= 10: break