Usage Guide¶
This guide covers the complete workflow of using the Tiiny SDK, from device connection to advanced AI tasks.
1. Device Connection & Authentication¶
Before using models, you need to connect to the Tiiny device and complete authentication with the device password.
from tiiny import TiinyDevice
# Initialize device with IPv6 address
DEVICE_IP = "fd80:7:7:7::1"
device = TiinyDevice(device_ip=DEVICE_IP)
# Check connection
if device.is_connected():
print("Device connected!")
# Authenticate to get API Key
MASTER_PASSWORD = "your_master_password"
api_key = device.get_api_key(master_password=MASTER_PASSWORD)
2. System Monitoring¶
You can monitor the status of the Neural Processing Units (NPUs) on the device.
npu_status_list = device.get_npu_status(api_key=api_key)
for npu in npu_status_list:
print(f"NPU: {npu['name']}")
print(f"Memory Usage: {npu['memory_usage_percent']}%")
3. Model Management¶
List Models¶
Retrieve the list of models available on the device.
Start a Model¶
Before using a model, it must be loaded into memory.
target_model = "Qwen/Qwen3-30B-A3B-Instruct-2507"
success = device.start_model(target_model, api_key=api_key)
if success:
print(f"Model {target_model} started successfully!")
Stop a Model¶
You can stop a running model to free up resources.
success = device.stop_model(target_model, api_key=api_key)
if success:
print(f"Model {target_model} stopped successfully!")
4. Text generation¶
The SDK provides an OpenAI-compatible interface for chat completions. You can use a large language model to generate text from a prompt. Models can generate almost any kind of text response—like code, mathematical equations, structured JSON data, or human-like prose.
Here's a simple example using the Chat Completions API.
from tiiny import OpenAI
client = OpenAI(api_key=api_key, base_url=device.get_url())
# Non-streaming
response = client.chat.completions.create(
model=target_model,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a joke."}
]
)
print(response.choices[0].message.content)
# Streaming
stream = client.chat.completions.create(
model=target_model,
messages=[{"role": "user", "content": "Tell me a story."}],
stream=True
)
for chunk in stream:
if chunk['choices']:
print(chunk['choices'][0]['delta'].get('content', ''), end='', flush=True)
5. Embeddings¶
Embeddings can be used to measure the relatedness of text strings. They are commonly used for: Search, Clustering, Recommendations, Anomaly detection, Diversity measurement,Classification.
An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.
embedding_model = "Qwen/Qwen3-Embedding-0.6B"
device.start_model(embedding_model, api_key=api_key)
response = client.embeddings.create(
model=embedding_model,
input="The quick brown fox jumps over the lazy dog."
)
vector = response.get("data", [])[0].get("embedding")
print(f"Vector length: {len(vector)}")
6. Rerank¶
A rerank model is a neural network that refines search or retrieval results by reassessing the relevance of initially retrieved candidates.
It analyzes the semantic relationship between a query and each candidate document, assigning a finer relevance score to promote the most pertinent items to the top of the list, thereby significantly improving final result quality.
rerank_model = "Qwen/Qwen3-Reranker-0.6B"
device.start_model(rerank_model, api_key=api_key)
query = "Apple"
documents = ["A delicious fruit", "A tech company", "A color"]
result = client.rerank.create(
model=rerank_model,
query=query,
documents=documents,
top_k=2
)
for res in result.get("results", []):
print(f"Doc: {documents[res['index']]}, Score: {res['relevance_score']}")
7. Image Generation¶
You can choose a Text-to-Image model to generate images.
import base64
import time
image_model = "Tongyi-MAI/Z-Image-Turbo"
device.start_model(image_model, api_key=api_key)
prompt = "A futuristic city skyline at sunset"
stream = client.images.generate(
model=image_model,
prompt=prompt,
steps=20,
width=512,
height=512
)
for chunk in stream:
if "progress" in chunk:
print(f"Progress: {chunk['progress']}%")
if "image" in chunk:
# Handle the base64 image data
image_val = chunk.get("image")
# Ensure we handle both string and list formats
image_base64 = image_val[0] if isinstance(image_val, list) else image_val
# Remove data URI prefix if present
if "," in image_base64:
image_base64 = image_base64.split(",", 1)[1]
# Decode and save
img_data = base64.b64decode(image_base64)
with open("output.png", "wb") as f:
f.write(img_data)
print("Image saved!")
8. Audio Transcription (ASR)¶
This API provides an audio transcription service with full feature parity to OpenAI's Whisper API.
# Simple transcription (returns JSON object with text)
result = client.audio.transcriptions.create(
model="FunAudioLLM/SenseVoiceSmall",
file="path/to/audio_file.wav" # Can be a path string or file-like object
)
print(f"Transcription: {result['text']}")
# Request specific response format (text, srt, vtt, verbose_json, json)
srt_content = client.audio.transcriptions.create(
model="FunAudioLLM/SenseVoiceSmall",
file="path/to/audio_file.mp3",
response_format="srt",
language="en" # Optional language hint
)
print(srt_content)