Models
The API provides access to several text-to-image models, each with its own strengths and weaknesses. There are two main categories: generalist models and finetuned models.
- Generalist models can generate a wide range of images from various domains, including photography and digital art in different styles and subjects.
- Finetuned models, on the other hand, are specialized models that have been trained to perform particularly well in a specific domain (ex: anime, photorealism, 3D renders, ...)
This article presents examples of image generation using different models. Please note that while some models may appear to perform better based on these examples, it is possible that other models may perform better with different prompts. As one image is not enough to represent the entirety of a model's capabilities, it is up to you to determine which models are most suitable for your use case based on your own judgment.
Native resolutions
Each model is associated to what is called its "native resolution". This is the resolution at which the model was trained, and the resolution at which it will perform best.
You can request images at any resolution, regardless of the native resolution of the model, smaller or larger. But the further you stray from the native resolution, the more the image may be degraded. For example, with very large resolutions, the image may lack coherence and the subject may be duplicated.
That being said, small deviations from the native resolution are usually fine.
Trigger prompts
Some models were trained to respond to particular "trigger prompts", which means that to activate their unique capabilities, you will need to include this trigger in your prompt, preferably towards the beginning.
Since the API offers you direct access to each model, you have the choice whether to include the trigger words in your prompt or not.
- some models like
openjourney
work well at generating the intended style without any trigger word (and just "amplify" the style if the words are included). - most models require it
- some models might even offer multiple trigger words to choose from, offering multiple style variations. (like
synthwavepunk_v2
)
But generally speaking, we recommend to always include the trigger words in your prompt, at the beginning.
If the trigger contains *subject*
, it is recommended to replace this by your intended subject.
For example if the trigger is RAW photo, *subject*, 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3
,
and you want to generate an image of a dog playing catch
,
then your final prompt should be RAW photo, dog playing catch, 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3
Model metadata API
You can programmatically request the list of available models and their metadata using the following API endpoint: GET /info
The response will be a JSON object, with the models listed in the models
property, in the following format:
id
: the model ID, which you will need to use in your API requestsname
: the human-readable name of the modelfamily
: this defines the model's architecture (ex:sd1
,sd2
,sdxl
)license
: the license under which the model is distributeddescription
: a short description of the modelcategories
: a list of categories that the model belongs tofunction
: a list of endpoints that the model supports (for example, most models are available withtext2image
andimage2image
, and the inpainting models are only available withinpainting
)nativeResolution: {"width": ..., "height": ...}
: the native resolution of the modeltriggers
: a list of trigger prompts that the model supports, ornull
if the model does not support triggers