Supercharge Z-Image Turbo with Qwen VL in ComfyUI: a new way to create AI images

3 min read 21 days ago
Published on Feb 02, 2026 This response is partially generated with the help of AI. It may contain inaccuracies.

Table of Contents

Introduction

In this tutorial, we will explore how to enhance image generation using the Z-Image Turbo model in ComfyUI, combined with the Qwen3 Video Language Model (VLM). This guide will walk you through the setup process, integrating custom nodes, and utilizing a conversational approach to create stunning AI images.

Step 1: Load Z-Image Turbo Template

  • Begin by accessing the Z-Image Turbo template within ComfyUI.
  • Install the necessary models:
    • Ensure you have the latest version of ComfyUI.
    • Download and set up the Visual Language Model (VLM), specifically Qwen3 VL.

Step 2: Install Custom Nodes

  • You will need to install specific custom nodes to enhance functionality:
    • KJ nodes
    • QwenVL
    • QwenImageWan Bridge
  • Follow the installation instructions provided in the ComfyUI documentation or the respective GitHub pages.

Step 3: Add QwenVL Custom Node

  • After installing QwenVL, add it to your ComfyUI workflow.
  • This will allow you to leverage the capabilities of the Qwen Video Language Model in your image generation tasks.

Step 4: Load Images for Creation

  • Prepare three images to serve as inputs:
    • One for the face
    • One for clothing
    • One for the scene
  • Import these images into your workspace for processing.

Step 5: Write the Prompt for QwenVL

  • Create a detailed prompt that combines features from the face, clothing, and scene images.
  • Make sure your prompt is clear and descriptive to guide the AI effectively.

Step 6: Run the Workflow

  • Execute the workflow with the Qwen VL prompt.
  • Monitor the process to ensure everything runs smoothly.

Step 7: Utilize Z-Image Text Encoder

  • Replace the CLIP Text encoder with the Z-Image Text Encoder in ComfyUI.
  • This will enhance interaction between the VLM and the Z-Image Turbo model for better results.

Step 8: Experiment with System Prompts

  • Test different system prompt templates to see how they affect image generation.
  • Experimentation is key to finding the optimal settings for your specific needs.

Step 9: Engage in a Conversation with Qwen3 VL

  • Use the Turn Builder feature to develop a conversational approach with Qwen3 VL.
  • This allows for iterative improvements and adjustments based on generated images.

Step 10: Create New Images with Instructions

  • Generate a new image using the instructions established in previous steps.
  • Adjust prompts as needed to refine the output.

Step 11: Rinse and Repeat

  • Continue the process of extending the conversation, adding details, and improving the images.
  • Change prompts and styles to explore various creative outcomes.

Conclusion

By following these steps, you can effectively supercharge your image generation process with Z-Image Turbo and Qwen VL in ComfyUI. Experimenting with different prompts and conversational techniques will lead to increasingly refined and creative AI images. For further exploration, consider diving deeper into the documentation and experimenting with various settings for unique results.