Usage Guide¶
This guide covers the complete workflow of using the Tiiny SDK, from device connection to advanced AI tasks.
Command Line Interface (CLI)¶
The tiiny command is the easiest way to interact with your device. After pip install tiiny-sdk, you can use it directly in the terminal.
Authentication¶
First, login to your device to save credentials. This avoids passing the IP and password for every command.
This will prompt for:
- Device IP: default is fd80:7:7:7::1
- Master Password: your device password
Credentials are saved to ~/.tiiny/config.json. You can manage them with:
- tiiny config: View current configuration
- tiiny logout: Remove saved credentials
Chat (Interactive)¶
To start a model and chat immediately:
Inside the chat:
- Type your prompt and press Enter for streaming response.
- Use /clear to clear conversation history.
- Use /exit or /quit to leave.
Benchmark¶
Measure the performance of different prompt sizes:
This reports: - Prefill Speed: Tokens per second for input processing. - Decoding Speed: Tokens per second for text generation. - Total Time: End-to-end latency.
Models Management¶
# List all available models on the device
tiiny models
# List models currently running and ready for API calls
tiiny models --running
Python SDK¶
For complex applications, use the Python SDK to integrate Tiiny into your code.
1. Device Connection & Authentication¶
Before using models, you need to connect to the Tiiny device and complete authentication.
from tiiny import TiinyDevice
# Initialize device with IPv6 address
DEVICE_IP = "fd80:7:7:7::1"
device = TiinyDevice(device_ip=DEVICE_IP)
# Check connection
if device.is_connected():
print("Device connected!")
# Authenticate to get API Key
MASTER_PASSWORD = "your_master_password"
api_key = device.get_api_key(master_password=MASTER_PASSWORD)
2. System Monitoring¶
Monitor the status of the Neural Processing Units (NPUs) on the device.
npu_status_list = device.get_npu_status(api_key=api_key)
for npu in npu_status_list:
print(f"NPU: {npu['name']}, Memory: {npu['memory_usage_percent']}%")
3. Model Management¶
List Models¶
Start/Stop a Model¶
Models must be loaded into memory before use.
target_model = "Qwen/Qwen3-30B-A3B-Instruct-2507"
# Start the model
if device.start_model(target_model, api_key=api_key):
print(f"Model {target_model} started!")
# Stop the model to free memory
device.stop_model(target_model, api_key=api_key)
4. Text Generation (OpenAI-Compatible)¶
The SDK provides an OpenAI-compatible interface for chat completions.
from tiiny import OpenAI
client = OpenAI(api_key=api_key, base_url=device.get_url())
# Streaming response
stream = client.chat.completions.create(
model=target_model,
messages=[{"role": "user", "content": "Tell me a story."}],
stream=True
)
for chunk in stream:
if chunk['choices']:
print(chunk['choices'][0]['delta'].get('content', ''), end='', flush=True)
5. Embeddings, Rerank, and Multi-modal APIs¶
Embeddings¶
embedding_model = "Qwen/Qwen3-Embedding-0.6B"
device.start_model(embedding_model, api_key=api_key)
response = client.embeddings.create(
model=embedding_model,
input="The quick brown fox jumps over the lazy dog."
)
vector = response.get("data", [])[0].get("embedding")
Rerank¶
rerank_model = "Qwen/Qwen3-Reranker-0.6B"
device.start_model(rerank_model, api_key=api_key)
result = client.rerank.create(
model=rerank_model,
query="Apple",
documents=["A delicious fruit", "A tech company"],
top_k=2
)
Image Generation¶
image_model = "Tongyi-MAI/Z-Image-Turbo"
device.start_model(image_model, api_key=api_key)
stream = client.images.generate(model=image_model, prompt="A sunset")
# ... handle output.png as shown in detailed examples
Audio (ASR)¶
result = client.audio.transcriptions.create(
model="FunAudioLLM/SenseVoiceSmall",
file="path/to/audio.wav"
)
print(f"Transcription: {result['text']}")
6. Speaker / Mic Audio Streams¶
RTMP playback control and live PCM recording (requires SDK 0.1.27+).