How to Call Flux 2 From Code With cURL and Python

The release of FLUX.2 brought a serious upgrade to AI image generation, but getting it running inside your own codebase still trips people up. Whether you want to fire off a single generation request from a shell script or build a full Python pipeline, the process is straightforward once you know which endpoints to hit and how authentication works.

This guide walks through the practical steps for calling FLUX.2 programmatically, covering both raw cURL commands and Python implementations. We focus on the fal.ai provider since it offers the most developer-friendly interface, but the patterns apply to other providers like Together AI and Segmind with minor adjustments. You can also explore FLUX outputs to see what the model is capable of before writing code.

Understanding FLUX.2 API Architecture

FLUX.2 is served through inference providers rather than a single official API. Black Forest Labs (the creators) license the model to platforms that handle the compute infrastructure. You can see the available FLUX models and their capabilities before deciding which variant to call. The three main FLUX.2 variants available via API are:

FLUX.2 Pro: highest quality, best for production assets, ~10s per image
FLUX.2 Dev: balanced quality/speed, good for iteration, ~5s per image
FLUX.2 Flash: fastest inference, ideal for prototyping, ~2s per image

Each provider exposes slightly different endpoints, but the core request structure stays consistent: you POST a JSON payload with your prompt and parameters, then either receive the image synchronously or poll an async endpoint for the result. Most providers now support the synchronous pattern for FLUX.2 since inference times are short enough to fit within HTTP timeout windows. If you have used Stable Diffusion APIs before, the migration to FLUX.2 is minimal.

The key difference from older Stable Diffusion APIs is that FLUX.2 endpoints typically return a direct image URL rather than base64-encoded data, which simplifies downstream processing.

Authentication and Setup

Before making any API calls, you need credentials from your chosen provider. Here is the setup for fal.ai (the most common choice for FLUX.2). The API documentation covers additional providers if fal.ai does not fit your stack.

Step 1: Sign up at fal.ai and generate an API key from the dashboard.

Step 2: Export the key as an environment variable:

export FAL_KEY="your-fal-api-key-here"

Step 3: Verify access by checking your account status:

curl -s https://fal.run/fal-ai/flux/dev \
  -H "Authorization: Key $FAL_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "test", "image_size": "square"}' | python3 -m json.tool

If you get a valid JSON response with an images array, your setup is correct. A 401 means your key is invalid or expired. A 429 means you have hit the rate limit for your plan.

For Together AI, the authentication uses a standard Bearer token pattern instead. For automated workflows, either provider works well.

Calling FLUX.2 With cURL

The cURL approach is useful for quick tests, shell scripts, and CI/CD pipelines where you do not want Python dependencies. Many AI tools accept this same request pattern. Here is the complete request for generating an image with FLUX.2 Pro via fal.ai:

curl -s -X POST "https://fal.run/fal-ai/flux-pro/v1" \
  -H "Authorization: Key $FAL_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a professional product photo of headphones on a marble surface, studio lighting",
    "image_size": "landscape_16_9",
    "num_images": 1,
    "guidance_scale": 3.5,
    "num_inference_steps": 28,
    "safety_tolerance": "2"
  }'

The response returns a JSON object with the generated image URL. All image models on fal.ai follow this same response format:

{
  "images": [
    {
      "url": "https://fal.media/files/...",
      "width": 1344,
      "height": 768,
      "content_type": "image/png"
    }
  ],
  "seed": 12345,
  "has_nsfw_concepts": [false]
}

To download the image directly to a file, pipe the URL extraction through a second curl call. This pattern is common in image generation workflows:

IMAGE_URL=$(curl -s -X POST "https://fal.run/fal-ai/flux-pro/v1" \
  -H "Authorization: Key $FAL_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "mountain landscape at golden hour", "image_size": "landscape_16_9"}' \
  | python3 -c "import sys,json; print(json.load(sys.stdin)['images'][0]['url'])")

curl -o output.png "$IMAGE_URL"

This two-step pattern works well inside bash scripts where you need to generate images programmatically and save them for downstream use.

Calling FLUX.2 With Python

Python gives you more control over error handling, retries, and batch processing. This is the approach most AI apps use under the hood. Here is a production-ready implementation using the requests library:

import os
import requests
import time

FAL_KEY = os.environ["FAL_KEY"]
ENDPOINT = "https://fal.run/fal-ai/flux-pro/v1"

def generate_image(prompt, size="landscape_16_9", steps=28, guidance=3.5):
    headers = {
        "Authorization": f"Key {FAL_KEY}",
        "Content-Type": "application/json"
    }
    payload = {
        "prompt": prompt,
        "image_size": size,
        "num_images": 1,
        "num_inference_steps": steps,
        "guidance_scale": guidance
    }

    response = requests.post(ENDPOINT, json=payload, headers=headers, timeout=60)
    response.raise_for_status()

    data = response.json()
    return data["images"][0]["url"]


# Single generation
url = generate_image("a cat sitting on a windowsill, soft afternoon light")
print(f"Generated: {url}")

For batch generation (multiple prompts), add basic rate limiting. This is critical if you plan to run automated pipelines:

prompts = [
    "minimalist logo design on white background",
    "aerial view of a coastal city at sunset",
    "close-up of coffee beans with steam rising"
]

results = []
for prompt in prompts:
    url = generate_image(prompt)
    results.append({"prompt": prompt, "url": url})
    time.sleep(1)  # respect rate limits

The fal.ai Python SDK is an alternative if you prefer a higher-level interface, similar to how AI plugins abstract common operations. Install with pip install fal-client and use it like this:

import fal_client

result = fal_client.subscribe("fal-ai/flux-pro/v1", arguments={
    "prompt": "your prompt here",
    "image_size": "landscape_16_9"
})
print(result["images"][0]["url"])

Both approaches produce identical results. The raw requests approach gives you full control, while the SDK handles polling and retries automatically. Platforms that offer a complete AI image editing suite often wrap these same endpoints in higher-level abstractions.

Flux 2 API Python integration workflow

Key Parameters and What They Do

Understanding the parameters helps you dial in the right output quality for your use case. The Imagen model shares many of the same parameters if you want to compare behavior:

prompt: the text description of what you want to generate. FLUX.2 responds well to natural language; you do not need the comma-separated keyword style that Stable Diffusion models prefer.
image_size: preset dimensions. Common values: square (1024x1024), landscape_16_9 (1344x768), portrait_9_16 (768x1344), landscape_4_3 (1152x864).
guidance_scale: how closely the model follows your prompt. Range 1.0-20.0, default 3.5. Higher values produce more literal interpretations but can reduce naturalness.
num_inference_steps: more steps = higher quality but slower. FLUX.2 Pro works well at 28 steps; Flash can use 4-8 steps.
seed: set a specific integer for reproducible outputs. Omit for random results.
safety_tolerance: controls the NSFW filter strictness. "2" is the default balanced setting.

For AI model exploration, experimenting with guidance_scale between 2.0 and 7.0 produces the widest variety of visual styles from a single prompt.

Error Handling and Production Tips

When integrating FLUX.2 into production systems, handle these common failure modes. The same patterns apply whether you are building a standalone script or plugging into enterprise infrastructure:

Rate limiting (429): Implement exponential backoff. Most providers allow 5-10 concurrent requests. Space batch jobs accordingly.

Timeout errors: Set a generous timeout (60s minimum for Pro, 30s for Flash). Network variability means inference can occasionally take longer than average. Check your provider's usage dashboard for concurrency limits.

Invalid responses: Always validate that the images array exists and contains at least one entry before accessing the URL. Occasionally a request succeeds (200) but returns an empty array due to content filtering.

def safe_generate(prompt, retries=3):
    for attempt in range(retries):
        try:
            response = requests.post(ENDPOINT, json=payload, headers=headers, timeout=60)
            if response.status_code == 429:
                time.sleep(2 ** attempt)
                continue
            response.raise_for_status()
            data = response.json()
            if data.get("images"):
                return data["images"][0]["url"]
        except requests.exceptions.Timeout:
            if attempt < retries - 1:
                continue
            raise
    return None

For workflows that process hundreds of images, consider an AI image editing suite that handles queuing, retries, and storage automatically rather than building all that infrastructure yourself.

Alternative Providers

While fal.ai is the most streamlined option, several other providers serve FLUX.2 with different pricing and performance characteristics:

Together AI: offers both FLUX.2 Pro and Flex. Uses standard OpenAI-compatible endpoints. Good documentation, slightly higher latency.
Segmind: competitive pricing, supports all FLUX.2 variants. Uses x-api-key header auth instead of Bearer tokens.
Replicate: prediction-based async model. Best for long-running jobs where you do not need instant results.
AIML API: OpenAI-compatible interface, easy migration if you already use their SDK.

The request structure across providers follows the same pattern: POST JSON with prompt and params, receive image URL in response. Switching providers usually means changing the endpoint URL and auth header format, nothing else. For a broader look at available AI tools and automation in this space, curated directories can help you compare options.

FAQ

What is the cheapest way to call FLUX.2 from code?

FLUX.2 Flash on fal.ai costs roughly $0.01 per image at standard resolution. Together AI offers a free tier with limited monthly generations. For exploring different models, free tiers are sufficient for testing.

Can I generate multiple images in a single API call?

Yes. Set num_images to your desired count (usually capped at 4 per request). Each image in the response array gets its own URL and dimensions. See the Recraft model page for examples of batch outputs at different quality levels.

How do I get consistent results with the same prompt?

Set a fixed seed value in your request payload. The same seed + prompt + parameters combination produces identical output across calls. This is useful for prompt testing where you want to isolate the effect of wording changes.

Is there a way to use FLUX.2 without an API key?

Some platforms like BasedLabs offer browser-based generation that does not require managing API keys directly. For programmatic access, an API key is always required.

What image formats does FLUX.2 return?

Most providers return PNG by default. Some support requesting JPEG or WebP via an output_format parameter, which reduces file size for web delivery. You can convert formats after generation if needed.

How do I handle the NSFW content filter?

The safety_tolerance parameter (0-6 on fal.ai) controls filter strictness. If your legitimate prompts are being blocked, increase the tolerance. For production apps serving end users, keep it at 2 or lower. The Nano Banana model uses similar safety controls if you want a comparison.

Can I fine-tune FLUX.2 through the API?

Yes, providers like fal.ai and Replicate support LoRA fine-tuning for FLUX.2. Upload your training images via their fine-tuning endpoint, wait for training to complete, then reference your custom model ID in generation requests. Check the training documentation for model customization options.

Wrapping Up

Calling FLUX.2 from code is a matter of sending the right HTTP request to the right endpoint. cURL works for quick tests and scripting. Python with requests gives you production-grade control over retries, batching, and error handling. The fal.ai SDK abstracts away polling if you prefer fewer lines of code. For a visual overview of what is possible, browse the video generation models that use similar API patterns.

The model itself is capable enough that prompt engineering matters more than parameter tuning for most use cases. Start with the defaults (guidance 3.5, 28 steps for Pro) and adjust only when your specific output needs demand it. From there, building batch pipelines or integrating generation into larger creative workflows is incremental work.