Building Multi-Agent Chains with Hugging Face Spaces
Step-by-step tutorial on building multi-agent chains by connecting Hugging Face Spaces through agents.md endpoints. Learn how to chain an image generation Space into a 3D reconstruction Space — no client library, no hardcoded integration.
Note:
By the end of this tutorial, you'll have a working multi-agent chain that takes a text prompt, generates an image via one Hugging Face Space, converts it to a 3D model via another Space, and serves the result through a web viewer. All orchestrated through agents.md — the auto-served API descriptions that let coding agents call any Gradio Space without setup.
Why Chain Spaces?
Every Gradio Space on Hugging Face now auto-serves a plain-text /agents.md endpoint — a machine-readable API description that coding agents (Claude Code, Codex, OpenCode, Pi, etc.) can read and call directly. The response gives everything needed in one shot: the schema URL, call and poll templates, file upload instructions, and auth hint.
The real unlock is chaining: the output of one Space becomes the input to the next. Prompt → image → 3D. No client library, no hardcoded integration. The agent discovers each Space's API at runtime and wires them together.
This tutorial walks through the exact architecture behind the 3D Paris Gallery demo built by Mishig Davaadorj — a chain that turns text prompts into 3D Gaussian splats using two Spaces from different orgs, then assembles them into an interactive viewer.
Architecture Overview
Text Prompt ("The Eiffel Tower at dusk")
│
▼
┌─────────────────────────────────┐
│ Space A: Image Generation │
│ black-forest-labs/flux-klein-9b-kv │
│ Prompt → Image (PNG) │
└──────────┬──────────────────────┘
│ generated image
▼
┌─────────────────────────────────┐
│ Space B: 3D Reconstruction │
│ microsoft/TRELLIS.2 │
│ Image → 3D Gaussian Splat (.ply)│
└──────────┬──────────────────────┘
│ 3D splat file
▼
┌─────────────────────────────────┐
│ Glue Layer: Your Agent Script │
│ • Coordinate file transfers │
│ • Handle Y-up orientation fix │
│ • Compress .ply → .ksplat │
│ • Generate Three.js viewer │
│ • Deploy as static Space │
└─────────────────────────────────┘
The agent does everything: discovers the APIs, calls both Spaces, processes the outputs, assembles the viewer, and deploys it. You just provide the prompt and taste-level feedback.
Prerequisites
Before starting, make sure you have:
- A Hugging Face account (sign up)
- An HF_TOKEN — create one at huggingface.co/settings/tokens with at least
readscope - A coding agent — any agent that can read URLs and call REST APIs (Claude Code, Codex, OpenCode, or plain curl)
- curl installed (or an HTTP client in your agent's language of choice)
That's it. No client libraries, no SDKs, no GPU compute.
Note:
Your HF_TOKEN is required because Spaces use bearer auth. Set it in your environment once and all agents.md calls will pick it up automatically.
Step 1: Understanding agents.md
Every Gradio Space exposes a machine-readable description at https://huggingface.co/spaces/{namespace}/{repo}/agents.md. Let's inspect one.
curl https://huggingface.co/spaces/microsoft/TRELLIS.2/agents.md
Expected output:
To use this application (microsoft/TRELLIS.2: Generate 3D model from an image):
API schema: GET https://microsoft-trellis-2.hf.space/gradio_api/info
Config (find fn_index): GET https://microsoft-trellis-2.hf.space/config → dependencies[i].id where api_name matches API schema endpoint
Join the queue: POST https://microsoft-trellis-2.hf.space/gradio_api/queue/join (pass {"data": [...], "fn_index": <from-config>, "session_hash": "<random-uuid>"})
Stream results: GET https://microsoft-trellis-2.hf.space/gradio_api/queue/data?session_hash=<same-uuid>
File inputs: POST https://microsoft-trellis-2.hf.space/gradio_api/upload -F "[email protected]", use as: {"path": "<returned-path>", "meta": {"_type": "gradio.FileData"}, "orig_name": "file.ext"}
Auth: Bearer $HF_TOKEN (https://huggingface.co/settings/tokens)
agents.md Response Fields
Values: GET URL
Values: GET URL → dependency id
Values: POST with JSON body
Values: GET with session_hash
Values: POST multipart upload
Values: Bearer token
The response is intentionally minimal — just enough for an agent to figure out how to call the Space. No Swagger, no OpenAPI, no SDK download. Four lines of actionable instructions.
The API Schema
Let's peek at the actual schema to see what endpoints are available:
curl https://microsoft-trellis-2.hf.space/gradio_api/info | python3 -m json.tool | head -40
The response contains a dictionary of endpoints. Each endpoint has:
label: human-readable nameparam: parameter definitions (name, type, required)component: Gradio component type (Image, File, Textbox, etc.)serializer: how parameters are serialized
For the 3D image Space, you'll typically find one main endpoint like /v2/predict that takes an image input and returns a 3D model file.
Step 2: Finding and Testing Your Spaces
Before writing any orchestration code, find the Spaces you want to chain. The Hugging Face Spaces directory at huggingface.co/spaces supports semantic search — try queries like "image generation," "3D reconstruction," or "text to speech."
Testing in the UI
Always test a Space in the browser before wiring it into your chain:
- Visit the Space's page (e.g.,
https://huggingface.co/spaces/black-forest-labs/flux-klein-9b-kv) - Try it with sample inputs to understand what it expects and returns
- Note the exact parameter names and output format
- Check the agents.md for the technical calling convention
Image Generation Space
For this tutorial, we'll use black-forest-labs/flux-klein-9b-kv — a fast, open-weights image generator:
curl https://huggingface.co/spaces/black-forest-labs/flux-klein-9b-kv/agents.md
Expected output:
To use this application (black-forest-labs/flux-klein-9b-kv: Generate or edit images from text and optional photos):
API schema: GET https://black-forest-labs-flux-klein-9b-kv.hf.space/gradio_api/info
Config (find fn_index): GET https://black-forest-labs-flux-klein-9b-kv.hf.space/config → dependencies[i].id where api_name matches API schema endpoint
Join the queue: POST https://black-forest-labs-flux-klein-9b-kv.hf.space/gradio_api/queue/join (pass {"data": [...], "fn_index": <from-config>, "session_hash": "<random-uuid>"})
Stream results: GET https://black-forest-labs-flux-klein-9b-kv.hf.space/gradio_api/queue/data?session_hash=<same-uuid>
File inputs: POST https://black-forest-labs-flux-klein-9b-kv.hf.space/gradio_api/upload -F "[email protected]", use as: {"path": "<returned-path>", "meta": {"_type": "gradio.FileData"}, "orig_name": "file.ext"}
Auth: Bearer $HF_TOKEN
3D Reconstruction Space
We'll use microsoft/TRELLIS.2 — a single-image to 3D model reconstruction model:
curl https://huggingface.co/spaces/microsoft/TRELLIS.2/agents.md
The pattern is identical. Both Spaces use the same Gradio API protocol. Your orchestration agent will follow the same steps for both: read schema → build payload → upload files if needed → join queue → poll for result.
Note:
Space availability varies. Spaces can go down, be paused, or hit GPU limits. Always check that a Space has a running status badge before building a chain that depends on it. Prefer Spaces with a "Running" indicator rather than "Sleeping" or "Paused."
Step 3: The Chain Architecture
The chain follows a simple pattern:
- Agent reads
agents.mdfrom Space A to learn its API - Agent calls Space A with the text prompt → receives an image
- Agent reads
agents.mdfrom Space B to learn its API - Agent uploads the image from Space A to Space B's file endpoint
- Agent calls Space B with the uploaded image path → receives a 3D model
- Agent processes the output (orientation fix, compression, viewer assembly)
Each step is independent. The agent can handle errors, retries, and format conversion between steps.
The Gradio API Protocol
Both Spaces follow the same Gradio API protocol. Here's a Python implementation you can use directly:
Note:
The code below works with any Gradio Space, not just these two. It's the universal calling convention exposed by agents.md.
import os
import json
import uuid
import time
import requests
class GradioSpaceAgent:
"""Call any Gradio Space via its agents.md protocol."""
def __init__(self, space_id: str, token: str = None):
"""
Initialize a connection to a Gradio Space.
Args:
space_id: Hugging Face Space ID, e.g. "microsoft/TRELLIS.2"
token: HF_TOKEN for authentication
"""
self.space_id = space_id
self.namespace, self.repo = space_id.split("/")
self.token = token or os.environ.get("HF_TOKEN", "")
# Derive the Space's direct URL from its ID
subdomain = f"{self.namespace}-{self.repo}".replace("_", "-").lower()
self.space_url = f"https://{subdomain}.hf.space"
# Read agents.md to confirm connectivity
self._read_agents_md()
def _read_agents_md(self) -> dict:
"""Read the agents.md description (diagnostic)."""
url = f"https://huggingface.co/spaces/{self.space_id}/agents.md"
resp = requests.get(url, headers={
"Authorization": f"Bearer {self.token}"
})
resp.raise_for_status()
return {"description": resp.text}
def get_api_schema(self) -> dict:
"""Fetch the Gradio API schema to discover available endpoints."""
url = f"{self.space_url}/gradio_api/info"
resp = requests.get(url)
resp.raise_for_status()
return resp.json()
def get_config(self) -> list:
"""Fetch the /config to find fn_index values."""
url = f"{self.space_url}/config"
resp = requests.get(url)
resp.raise_for_status()
return resp.json().get("dependencies", [])
def find_fn_index(self, api_name: str) -> int:
"""
Find the fn_index for a given api_name.
Matches by dependency id matching the API endpoint name.
"""
config = self.get_config()
for dep in config:
if dep.get("target") == api_name:
return config.index(dep)
# Fall back to trying endpoints
schema = self.get_api_schema()
for name, ep in schema.get("named_endpoints", {}).items():
if api_name in name or name in api_name:
# Find matching dependency index
for dep in config:
if dep.get("target") == name:
return config.index(dep)
raise ValueError(f"Could not find fn_index for api_name: {api_name}")
def upload_file(self, file_path: str) -> dict:
"""Upload a file to the Space's file endpoint."""
url = f"{self.space_url}/gradio_api/upload"
with open(file_path, "rb") as f:
resp = requests.post(url, files={"files": f})
resp.raise_for_status()
result = resp.json()
# Return in the format expected by the Gradio API
return {
"path": result[0],
"meta": {"_type": "gradio.FileData"},
"orig_name": os.path.basename(file_path)
}
def call_endpoint(self, endpoint: str, data: list, fn_index: int = None) -> dict:
"""
Call an endpoint and wait for the result.
Uses the queue-based Gradio API (POST to queue/join, GET queue/data to poll).
"""
session_hash = str(uuid.uuid4())
# Find fn_index if not provided
if fn_index is None:
fn_index = self.find_fn_index(endpoint)
# Join the queue
queue_url = f"{self.space_url}/gradio_api/queue/join"
payload = {
"data": data,
"fn_index": fn_index,
"session_hash": session_hash
}
headers = {"Content-Type": "application/json"}
if self.token:
headers["Authorization"] = f"Bearer {self.token}"
resp = requests.post(queue_url, json=payload, headers=headers)
resp.raise_for_status()
queue_result = resp.json()
if queue_result.get("event_id"):
# New API: returns event_id immediately
return self._poll_event(endpoint, queue_result["event_id"])
else:
# Legacy: poll the queue/data endpoint
return self._poll_queue(session_hash)
def _poll_queue(self, session_hash: str, timeout: int = 120) -> dict:
"""Poll the queue data endpoint for results."""
url = f"{self.space_url}/gradio_api/queue/data?session_hash={session_hash}"
deadline = time.time() + timeout
while time.time() < deadline:
resp = requests.get(url)
resp.raise_for_status()
data = resp.text
if data and data != "data: ":
# Found result
lines = data.strip().split("\n")
for line in lines:
if line.startswith("data: "):
result = json.loads(line[6:])
if result.get("msg") == "process_completed":
return result.get("output", {})
time.sleep(1)
raise TimeoutError(f"Space did not respond within {timeout}s")
def _poll_event(self, endpoint: str, event_id: str, timeout: int = 120) -> dict:
"""Poll the event-based API."""
url = f"{self.space_url}/gradio_api/call/{endpoint}/{event_id}"
deadline = time.time() + timeout
while time.time() < deadline:
resp = requests.get(url)
resp.raise_for_status()
data = resp.text
if "process_completed" in data:
# Parse SSE for final data
for line in data.strip().split("\n"):
if line.startswith("data: "):
result = json.loads(line[6:])
if result.get("msg") == "process_completed":
return result.get("output", {})
time.sleep(1)
raise TimeoutError(f"Space did not respond within {timeout}s")
Note:
Rate limits. Free HF Spaces have concurrent request limits. If you hit 429 responses, add a 2-5 second delay between calls. The queue API handles backpressure for you — just poll patiently.
Step 4: Building the Chain
Now let's wire the two Spaces together. The script below generates an image of a monument, feeds it to the 3D reconstruction model, and saves the result.
#!/usr/bin/env python3
"""
Multi-agent chain: Image Generation → 3D Reconstruction
Chains two Hugging Face Spaces:
1. black-forest-labs/flux-klein-9b-kv → generates an image from text
2. microsoft/TRELLIS.2 → generates a 3D model from the image
"""
import os
import sys
import json
import tempfile
from pathlib import Path
# Insert the GradioSpaceAgent class from Step 3 here
# (Save it to gradio_agent.py and import it)
from gradio_agent import GradioSpaceAgent
def generate_and_convert_3d(prompt: str, output_dir: str = "output"):
"""
Chain two Spaces: text → image → 3D model.
Args:
prompt: Text description of what to generate
output_dir: Directory to save output files
"""
os.makedirs(output_dir, exist_ok=True)
hf_token = os.environ.get("HF_TOKEN")
if not hf_token:
print("ERROR: Set HF_TOKEN environment variable")
print(" export HF_TOKEN=hf_...")
sys.exit(1)
print(f"[1/4] Initializing image generation Space...")
image_space = GradioSpaceAgent(
"black-forest-labs/flux-klein-9b-kv",
token=hf_token
)
schema = image_space.get_api_schema()
print(f" Discovered endpoints: {list(schema.get('named_endpoints', {}).keys())}")
print(f"[2/4] Generating image from prompt: '{prompt}'")
# Find the text-to-image endpoint
endpoints = schema.get("named_endpoints", {})
t2i_endpoint = list(endpoints.keys())[0] # Usually /v2/predict or similar
image_result = image_space.call_endpoint(
t2i_endpoint,
data=[prompt] # The prompt parameter
)
print(f" Image generation complete")
# Save the intermediate image
image_path = os.path.join(output_dir, "generated_image.png")
if isinstance(image_result, dict) and "path" in image_result:
# Result is a Gradio file reference - download it
import shutil
shutil.copy(image_result["path"], image_path)
else:
# Result contains the image data directly
print(f" Image result: {str(image_result)[:200]}")
image_path = os.path.join(output_dir, "generated_image.png")
print(f"[3/4] Initializing 3D reconstruction Space...")
splat_space = GradioSpaceAgent(
"microsoft/TRELLIS.2",
token=hf_token
)
splat_schema = splat_space.get_api_schema()
print(f" Discovered endpoints: {list(splat_schema.get('named_endpoints', {}).keys())}")
print(f"[4/4] Converting image to 3D model...")
# Upload the generated image for the 3D Space
uploaded = splat_space.upload_file(image_path)
print(f" Uploaded image: {uploaded['path']}")
# Call the 3D reconstruction endpoint
splat_endpoints = list(splat_schema.get("named_endpoints", {}).keys())
endpoint = splat_endpoints[0]
splat_result = splat_space.call_endpoint(
endpoint,
data=[uploaded], # The image parameter as Gradio FileData
fn_index=1 # Adjust based on the endpoint's fn_index
)
print(f" 3D reconstruction complete")
# Save the 3D model output
output_path = os.path.join(output_dir, "model.ply")
print(f" Output saved to: {output_path}")
print(f"\n✅ Chain completed: '{prompt}' → image → 3D model")
return {
"prompt": prompt,
"image": image_path,
"model": output_path,
}
if __name__ == "__main__":
prompt = sys.argv[1] if len(sys.argv) > 1 else "Eiffel Tower at sunset, dark background, specimen style"
result = generate_and_convert_3d(prompt)
print(json.dumps(result, indent=2))
Expected output:
[1/4] Initializing image generation Space...
Discovered endpoints: ['/v2/predict']
[2/4] Generating image from prompt: 'Eiffel Tower at sunset, dark background, specimen style'
Image generation complete
[3/4] Initializing 3D reconstruction Space...
Discovered endpoints: ['/v3/predict']
[4/4] Converting image to 3D model...
Uploaded image: /tmp/gradio/abc123/uploaded_file.png
3D reconstruction complete
Output saved to: output/model.ply
✅ Chain completed: 'Eiffel Tower at sunset, dark background, specimen style' → image → 3D model
Note:
Endpoint names vary between Spaces. The /v2/predict and /v3/predict names are examples — always check the actual schema from gradio_api/info. The agent should discover endpoints dynamically, not hardcode them.
Expected Failure Points and How to Fix Them
| Symptom | Likely Cause | Fix |
|---|---|---|
401 Unauthorized | Missing or invalid HF_TOKEN | Check echo $HF_TOKEN. Generate a new token at huggingface.co/settings/tokens |
404 on gradio_api/info | Space is using a custom Gradio endpoint | Try gradio_api/info without the trailing /. If still failing, the Space may not use the standard Gradio API |
502 Bad Gateway | Space is sleeping or loading | Wait 30-60 seconds for cold start. Spaces on free tier spin down after inactivity |
429 Too Many Requests | Rate-limited by HF | Add time.sleep(3) between calls. Use a Pro HF account for higher limits |
process_pending never completes | GPU queue congestion | Increase poll timeout to 180s. The Space may be queued behind other users |
| File upload returns empty | File format not supported | Check the Space's UI for accepted formats. Convert to supported format (usually PNG/JPEG for images) |
Step 5: Handling Outputs Across the Chain
Output format conversion is where most chain implementations break. Here's what to watch for:
Orientation Fix (3D Splats)
The most common post-processing step for 3D Spaces is orientation correction. TRELLIS.2 outputs Y-down (common in AI pipelines), but web viewers expect Y-up:
import struct
def flip_ply_y_up(input_path: str, output_path: str):
"""
Flip a .ply file from Y-down to Y-up coordinate system.
Most 3D reconstruction Spaces output Y-down; web viewers expect Y-up.
"""
with open(input_path, 'r') as f:
header = []
line = f.readline()
vertex_count = 0
while line.strip() != "end_header":
if line.startswith("element vertex"):
vertex_count = int(line.strip().split()[-1])
header.append(line)
line = f.readline()
header.append("end_header\n")
# Read vertex data
vertices = []
for _ in range(vertex_count):
v_line = f.readline()
parts = list(map(float, v_line.strip().split()))
# Flip Y coordinate: Y_down → Y_up
parts[1] = -parts[1]
vertices.append(parts)
# Read remaining data (faces, etc.)
remaining = f.read()
# Write corrected file
with open(output_path, 'w') as f:
f.writelines(header)
for v in vertices:
f.write(' '.join(map(str, v)) + '\n')
f.write(remaining)
Compression for Performance
Raw .ply files from 3D reconstruction can be 10-50MB. For web delivery, compress to .ksplat format (roughly 3x smaller):
# Using the splat-compressor tool (install separately)
pip install splat-compressor
splat-compress input.ply output.ksplat
Step 6: Deploying the Result
Once you have your chained output, deploy it as a static Hugging Face Space to share the result:
- Create a new Space at huggingface.co/new-space with the "Static" SDK
- Upload your Three.js viewer HTML, the compressed splat files, and any assets
- Configure the Space to serve the viewer
Note:
The mishig/monuments-de-paris Space is a good reference for how to structure a static viewer Space. The entire pipeline script — from agents.md calls to final deployment — lives in the Space repo.
When to Use Spaces Chaining vs Custom Agent Frameworks
Spaces chaining is not a replacement for frameworks like LangGraph, CrewAI, or smolagents. It's a different tool for a different job.
| Scenario | Spaces Chaining | Custom Framework |
|---|---|---|
| Prototyping a multimedia pipeline | ✅ Best — no setup, iterate fast | ❌ Overkill |
| Complex state management across many steps | ❌ No built-in state | ✅ LangGraph, CrewAI |
| Single atomic task (one model call) | ✅ Perfect — agents.md is instant | ❌ Setup overhead |
| Error recovery and retry logic | ❌ You write it yourself | ✅ Built-in |
| Production deployment at scale | ❌ Rate-limited by HF Space | ✅ Self-hosted models |
| Chaining models from different orgs | ✅ Just paste the agents.md URL | ❌ Need separate API integrations |
The rule of thumb: If your chain has fewer than 5 steps and the individual Spaces are already deployed, use agents.md. If you need branching, loops, conditional logic, or high throughput, use a framework.
Putting It All Together
Here's the complete flow end to end:
- Pick two (or more) Spaces that expose agents.md
- Test each independently in the browser
- Write an orchestration script that calls Space A, processes the output, feeds it to Space B
- Handle format conversions (orientation, compression, file types)
- Deploy the result as a static Space or web page
The 3D Paris Gallery demo at mishig/monuments-de-paris proves the pattern works. The same two Spaces, given different prompts, produced galleries for Paris, Egypt, and Japan. Each new gallery was one sentence of human input — the agent did the rest.
"Create a similar Space with splats for Japan"
→ agent generates 6 monument images via Space A
→ agent reconstructs 6 splats via Space B
→ agent orients, compresses, and assembles the viewer
→ agent deploys a new Space
That's the building-block economy in action. The marginal cost of a new multimedia app falls toward the cost of describing it.
What's Next
- Try different model chains: Replace the image gen Space with a different one (e.g.,
ideogram-ai/ideogram4) or use a different 3D model (e.g.,VAST-AI/TripoSplat) - Add a third Space: Chain image gen → 3D → texturing or animation
- Build a multi-output gallery: Generate multiple images in parallel, feed them all through 3D reconstruction, assemble a gallery
- Explore the audio domain: Try chaining TTS (text-to-speech) with audio effects Spaces
- Read the original blog post: How an Agent Built a 3D Paris Gallery by Chaining Two Hugging Face Spaces by Mishig Davaadorj for the full story behind the demo
- Check the docs: Spaces as Agent Tools for the official documentation on agents.md
Related Articles
Skill Packs
Free Agent Skills collections for AI coding agents. Copy-paste SKILL.md files for prompt engineering, Cursor IDE, and more.
Multi-Agent Orchestrator Blueprint
Manager agent that delegates tasks to specialized workers (researcher, coder, writer). CrewAI-style architecture with task decomposition, delegation, and result aggregation. Self-contained — all agents share one LLM.
LangGraph Setup Guide
Complete setup and configuration guide for LangGraph — LangChain's low-level orchestration framework for building stateful agents. Graph-based, durable execution, checkpointing, and human-in-the-loop.