SD 3.5 Medium: A Welcome Release With A Few Bonuses

SD 3.5 Medium Model Released News
Generated with the SD 3.5 Large to Medium workflow below.

Stability AI has released Stable Diffusion 3.5 Medium, a 2.5 billion parameter model that brings advanced image generation capabilities to consumer hardware. This comes after the SD 3.5 Large and SD 3.5 Large Turbo models were released about a week ago.


Advertisement

Runpod.io Advertisement Banner

Built on an improved MMDiT-X architecture, the model generates images between 0.25 and 2 megapixel resolution while requiring only 9.9 GB of VRAM (excluding text encoders).

The model’s architecture allows for native generation at a resolution of 1440×1440. It utilizes multiple text encoders including OpenCLIP-ViT/G, CLIP-ViT/L, and T5-xxl, supporting various token lengths for different training stages.

Design Philosophy and Performance

In developing SD 3.5 Medium, Stability AI prioritized customizability and accessibility.

The integration of Query-Key Normalization in transformer blocks stabilizes training and simplifies fine-tuning processes. This focus on flexibility introduces certain characteristics: outputs may show greater variation with identical prompts, and results can be more sensitive to prompt specificity.

The model demonstrates strong performance in prompt adherence and competes effectively with other medium-sized models. Its mixed-scale image training enhances multi-resolution generation capabilities, while maintaining efficient hardware requirements that make it accessible to a broader user base.

Skip Layer Guidance: A New Approach to Detail Control

Skip Layer Guidance (SLG) introduces an additional control mechanism for the generation process. This feature allows scaling of attention across specific layers, with the primary goal of improving anatomical accuracy:

  • Detail Control: Useful for refining elements like hands and facial features
  • Layer Manipulation: Provides options to scale individual layers up or down
  • Model Support: Functions with both Medium and Large models
  • Experimental Potential: Different layer combinations can produce varying results

Tips for Best Results Using Skip Layer Guidance

  • Test different layer combinations. Start with 3,5,7
  • Adjust strength parameters carefully
  • Watch for potential oversaturation
  • Consider reducing CFG and sampling steps with SLG

Community Discovery: Large to Medium Pipeline

Reddit user u/_roblaughter_ has discovered an effective workflow that leverages the strengths of both SD 3.5 Large and Medium models. As detailed in their Reddit post, this approach combines:

  1. Initial generation with SD 3.5 Large using Skip Layer Guidance
  2. A second pass using SD 3.5 Medium at 1440×1440
  3. Optional additional upscaling techniques

This workflow shows promising results with a pipeline that takes advantage of Medium’s higher native resolution support while maintaining the quality advantages of the Large model.

Early results suggest this approach can help preserve and enhance details during the upscaling process, though results vary depending on the specific content and settings used.

Big thanks to u/_roblaughter_ on Reddit for sharing this workflow, which you can find detailed in his original thread here

Current State and Future Potential

SD 3.5 Medium represents a hopeful look forward with a decent base model.

The base model performs adequately, and its Skip Layer Guidance feature offers interesting possibilities for detail control. The reduced resource requirements make it accessible to users with modest hardware, while still maintaining reasonable output quality.

The true value of SD 3.5 Medium may well lie in its potential for fine-tuning and specialized applications, particularly given its more manageable resource requirements. As the community continues to experiment with the model and discover new techniques, like u/_roblaughter_’s workflow, we’re likely to see more innovative applications emerge.

For those interested in experimenting with SD 3.5 Medium, a gradual approach is recommended: start with the base model to understand its characteristics, then explore SLG and community-developed workflows as needed.

While not revolutionary, this model adds useful features and is bound to spark even more community generated image generation pipelines.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Shopping Cart
Scroll to Top