Before we dive into the details of this post, let us provide the previous two links that precede it.
Enabling & Exploring Stable Defussion – Part 1
Enabling & Exploring Stable Defussion – Part 2
For, reference, we’ll share the demo before deep dive into the actual follow-up analysis in the below section –
Now, let us continue our discussions from where we left.
CODE:
clsText2Image.py (This class converts input prompts to an intermediate image)
class clsText2Image:
def __init__(self, pipe, output_path, filename):
self.pipe = pipe
# More aggressive attention slicing
self.pipe.enable_attention_slicing(slice_size=1)
self.output_path = f"{output_path}{filename}"
# Warm up the pipeline
self._warmup()
def _warmup(self):
"""Warm up the pipeline to optimize memory allocation"""
with torch.no_grad():
_ = self.pipe("warmup", num_inference_steps=1, height=512, width=512)
torch.mps.empty_cache()
gc.collect()
def generate(self, prompt, num_inference_steps=12, guidance_scale=3.0):
try:
torch.mps.empty_cache()
gc.collect()
with torch.autocast(device_type="mps"):
with torch.no_grad():
image = self.pipe(
prompt,
num_inference_steps=num_inference_steps,
guidance_scale=guidance_scale,
height=1024,
width=1024,
).images[0]
image.save(self.output_path)
return 0
except Exception as e:
print(f'Error: {str(e)}')
return 1
finally:
torch.mps.empty_cache()
gc.collect()
def genImage(self, prompt):
try:
# Initialize generator
x = self.generate(prompt)
if x == 0:
print('Successfully processed first pass!')
else:
print('Failed complete first pass!')
raise
return 0
except Exception as e:
print(f"\nAn unexpected error occurred: {str(e)}")
return 11. CLASS INSTANTIATE(pipe, output_path, filename)
This is the initialization method for the clsText2Image class:
- Takes a pre-configured
pipe(text-to-image pipeline), anoutput_path, and afilename. - Enables more aggressive memory optimization by setting “attention slicing.”
- Prepares the full file path for saving generated images.
- Calls a
_warmupmethod to pre-load the pipeline and optimize memory allocation.
2. _warmup
This private method warms up the pipeline:
- Sends a dummy “warmup” request with basic parameters to allocate memory efficiently.
- Clears any cached memory (
torch.mps.empty_cache()) and performs garbage collection (gc.collect()). - Ensures smoother operation for future image generation tasks.
3. generate(prompt, num_inference_steps=12, guidance_scale=3.0)
This method generates an image from a text prompt:
- Clears memory cache and performs garbage collection before starting.
- Uses the text-to-image pipeline (
pipe) to generate an image:- Takes the prompt, number of inference steps, and guidance scale as input.
- Outputs an image at 1024×1024 resolution.
- Saves the generated image to the specified output path.
- Returns
0on success or1on failure. - Ensures cleanup by clearing memory and collecting garbage, even in case of errors.
4. genImage(prompt)
This method simplifies image generation:
- Calls the
generatemethod with the given prompt. - Prints a success message if the image is generated (
0return value). - On failure, logs the error and raises an exception.
- Returns
0on success or1on failure.
clsImage2Video.py (This class converts an image to video as part of the second pass)
class clsImage2Video:
def __init__(self, pipeline):
# Optimize model loading
torch.mps.empty_cache()
self.pipeline = pipeline
def generate_frames(self, pipeline, init_image, prompt, duration_seconds=10):
try:
torch.mps.empty_cache()
gc.collect()
base_frames = []
img = Image.open(init_image).convert("RGB").resize((1024, 1024))
for _ in range(10):
result = pipeline(
prompt=prompt,
image=img,
strength=0.45,
guidance_scale=7.5,
num_inference_steps=25
).images[0]
base_frames.append(np.array(result))
img = result
torch.mps.empty_cache()
frames = []
for i in range(len(base_frames)-1):
frame1, frame2 = base_frames[i], base_frames[i+1]
for t in np.linspace(0, 1, int(duration_seconds*24/10)):
frame = (1-t)*frame1 + t*frame2
frames.append(frame.astype(np.uint8))
return frames
except Exception as e:
frames = []
print(f'Error: {str(e)}')
return frames
finally:
torch.mps.empty_cache()
gc.collect()
# Main method
def genVideo(self, prompt, inputImage, targetVideo, fps):
try:
print("Starting animation generation...")
init_image_path = inputImage
output_path = targetVideo
fps = fps
frames = self.generate_frames(
pipeline=self.pipeline,
init_image=init_image_path,
prompt=prompt,
duration_seconds=20
)
imageio.mimsave(output_path, frames, fps=30)
print("Animation completed successfully!")
return 0
except Exception as e:
x = str(e)
print('Error: ', x)
return 11. CLASS INSTANTIATE (pipeline)
This initializes the clsImage2Video class:
- Clears the GPU cache to optimize memory before loading.
- Sets up the
pipelinefor generating frames, which uses an image-to-video transformation model.
2. generate_frames(pipeline, init_image, prompt, duration_seconds=10)
This function generates frames for a video:
- Starts by clearing GPU memory and running garbage collection.
- Loads the
init_image, resizes it to 1024×1024 pixels, and converts it to RGB format. - Iteratively applies the
pipelineto transform the image:- Uses the
promptand specified parameters likestrength,guidance_scale, andnum_inference_steps. - Stores the resulting frames in a list.
- Uses the
- Interpolates between consecutive frames to create smooth transitions:
- Uses linear blending for smooth animation across a specified duration and frame rate (24 fps for 10 segments).
- Returns the final list of generated frames or an empty list if an error occurs.
- Always clears memory after execution.
3. genVideo(prompt, inputImage, targetVideo, fps)
This is the main function for creating a video from an image and text prompt:
- Logs the start of the animation generation process.
- Calls
generate_frames()with the givenpipeline,inputImage, andpromptto create frames. - Saves the generated frames as a video using the
imageiolibrary, setting the specified frame rate (fps). - Logs a success message and returns
0if the process is successful. - On error, logs the issue and returns
1.
Now, let us understand the performance. But, before that let us explore the device on which we’ve performed these stress test that involves GPU & CPUs as well.

And, here is the performance stats –

From the above snapshot, we can clearly communicate that the GPU is 100% utilized. However, the CPU has shown a significant % of availability.

As you can see, the first pass converts the input prompt to intermediate images within 1 min 30 sec. However, the second pass constitutes multiple hops (11 hops) on an avg 22 seconds. Overall, the application will finish in 5 minutes 36 seconds for a 10-second video clip.
So, we’ve done it.
You can find the detailed code at the GitHub link.
I’ll bring some more exciting topics in the coming days from the Python verse.
Till then, Happy Avenging! 🙂
Note: All the data & scenarios posted here are representational data & scenarios & available over the internet & for educational purposes only. There is always room for improvement in this kind of model & the solution associated with it. I’ve shown the basic ways to achieve the same for educational purposes only.
You must be logged in to post a comment.