How accurate are AI-generated captions?

GPT-4o produces highly accurate and detailed descriptions. For accessibility alt text, the quality is typically better than human-written captions because the model describes what it actually sees.

Can I caption images in languages other than English?

Yes, add the target language to your prompt: "Describe this image in one sentence in Spanish."

Is there a free alternative?

Reformat offers a free AI Image Caption tool that generates captions without needing an API key. It is great for quick, one-off captioning tasks.

How to Use OpenAI's Vision API to Caption Images Automatically

What Is the Vision API?

OpenAI's GPT-4o model can understand images alongside text. You send an image (as a URL or base64-encoded data) and ask the model to describe, analyze, or extract information from it.

Common use cases:

Image captioning for accessibility (alt text)
Content moderation — detecting inappropriate content
Data extraction — reading receipts, invoices, or charts
Visual Q&A — asking questions about what is in an image

In this tutorial, we will focus on generating descriptive captions for images.

Setup and Authentication

Install the OpenAI Python SDK:

pip install openai

Set your API key as an environment variable:

export OPENAI_API_KEY="sk-your-key-here"

Or create a .env file and load it with python-dotenv:

pip install python-dotenv

from dotenv import load_dotenv
load_dotenv()

Captioning a Single Image

Here is how to caption an image from a URL:

from openai import OpenAI

client = OpenAI()

def caption_image(image_url, style="descriptive"):
    prompts = {
        "descriptive": "Describe this image in one detailed sentence suitable for an alt text attribute.",
        "concise": "Write a short, 5-10 word caption for this image.",
        "social": "Write an engaging social media caption for this image.",
    }

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompts.get(style, prompts["descriptive"])},
                    {"type": "image_url", "image_url": {"url": image_url}},
                ],
            }
        ],
        max_tokens=150,
    )
    return response.choices[0].message.content

# Usage
caption = caption_image("https://example.com/sunset.jpg")
print(caption)
# Output: "A vibrant sunset over a calm ocean with orange and purple hues reflecting off gentle waves."

Detail	Value
Model	gpt-4o
Cost per image	~$0.01-0.03 (depends on resolution)
Rate limit	500 RPM (tier 1)
Max image size	20 MB
Supported formats	JPG, PNG, WebP, GIF

How to Use OpenAI's Vision API to Caption Images Automatically

What Is the Vision API?

Setup and Authentication

Captioning a Single Image

Related Tutorials

How to Convert Images to WebP Using Python and Pillow

How to Build a RAG Chatbot with LangChain and OpenAI in 2026

Systemd Service Files: Run Any App as a Linux Service

Captioning Local Files

Batch Processing Multiple Images

Cost and Rate Limits

FAQ