Create a 3D Object from Your Images with TripoSR in Python : Ritwik Raha
by: Ritwik Raha
blow post content copied from PyImageSearch
click here to view original post
Table of Contents
- Create a 3D Object from Your Images with TripoSR in Python
- Image to 3D Objects
- Setting Up the Environment
- Importing Necessary Libraries
- Setting Up the Device
- Creating a Timer Utility
- Uploading and Preparing the Image
- Setting Up TripoSR Parameters
- Initializing the TripoSR Model
- Processing the Image
- Generating the 3D Model and Rendering
- Downloading the .stl File
- Displaying the Result
- Summary
Create a 3D Object from Your Images with TripoSR in Python
In this tutorial, we’ll walk you through the process of creating a 3D object from a single image using TripoSR, a state-of-the-art model for fast-feedforward 3D reconstruction. We’ll cover everything from setting up the environment to generating the final 3D model and rendering a result video.
To learn how to generate high-quality 3D objects from a SINGLE image, just keep reading.
Looking for the source code to this post?
Jump Right To The Downloads SectionImage to 3D Objects
At PyImageSearch, we have shown how to create 3D objects from an array of specialized images using Neural Implicit Scene Rendering (NeRFs). While NeRF is a reliable and established process for generating 3D objects with images, there are multiple problems with this approach.
- Exact COLMAP settings need to be calibrated
- Multiple images from multiple angles
- Expensive training and inference
- Generates low-quality 3D views
Ideally, with the advancements of Computer Vision in the last 3 years, we would like to generate reliable and high-quality 3D objects fast from limited (read 1) images. Enter TripoSR from StabilityAI.
Leveraging the principles of the Large Reconstruction Model (LRM), TripoSR brings to the table key advancements that significantly boost both the speed and quality of 3D reconstruction (as shown in Figure 1). Our model is distinguished by its ability to rapidly process inputs, generating high-quality 3D models in less than 0.5 seconds on an NVIDIA A100 GPU. TripoSR has exhibited superior performance in both qualitative and quantitative evaluations, outperforming other open-source alternatives across multiple public datasets. The figures below illustrate visual comparisons and metrics showcasing TripoSR’s performance relative to other leading models. Details about the model architecture, training process, and comparisons can be found in this technical report.
TripoSR is a new 3D reconstruction model that is based on the Large Reconstruction Model (LRM). It is faster and more accurate than other open-source 3D reconstruction models. TripoSR can process inputs in less than 0.5 seconds on an NVIDIA A100 GPU. It has been evaluated on multiple public datasets and has been shown to outperform other models in both qualitative and quantitative evaluations.
The figures below show visual comparisons and metrics of TripoSR’s performance relative to other leading models. Details about the model architecture, training process, and comparisons can be found in this technical report.
TripoSR is a promising new 3D reconstruction model that has the potential to be used in a variety of applications. It is fast, accurate, and easy to use.
This tutorial uses your images for generating 3D objects. This means you will be able to upload images and turn them into 3D objects. Product photography images are notoriously hard to gather from the internet. How would you like immediate access to 3,457 professional images curated and labeled with hand gestures to train, explore, and experiment with … for free? Head over to Roboflow and get a free account to grab these hand gesture images.
Setting Up the Environment
First, we need to set up our environment:
!git clone https://github.com/pyimagesearch/TripoSR.git import sys sys.path.append('/content/TripoSR/tsr') %cd TripoSR !pip install -r requirements.txt -q
Here, we’re cloning the TripoSR repository, adding it to our Python path, changing it to the TripoSR directory, and installing the required dependencies.
Importing Necessary Libraries
Next, we import the required libraries:
import torch import os import time from PIL import Image import numpy as np from IPython.display import Video from tsr.system import TSR from tsr.utils import remove_background, resize_foreground, save_video import pymeshlab as pymesh import rembg
We’re importing various libraries for image processing, 3D modeling, and utility functions. The TSR
class from tsr.system
is the core of TripoSR.
Setting Up the Device
We determine whether to use CUDA (GPU) or CPU:
device = "cuda" if torch.cuda.is_available() else "cpu"
This line checks if a CUDA-compatible GPU is available and sets the device accordingly.
Creating a Timer Utility
To measure the performance of different steps, we create a Timer
class:
class Timer: def __init__(self): self.items = {} self.time_scale = 1000.0 # ms self.time_unit = "ms" def start(self, name: str) -> None: if torch.cuda.is_available(): torch.cuda.synchronize() self.items[name] = time.time() def end(self, name: str) -> float: if name not in self.items: return if torch.cuda.is_available(): torch.cuda.synchronize() start_time = self.items.pop(name) delta = time.time() - start_time t = delta * self.time_scale print(f"{name} finished in {t:.2f}{self.time_unit}.") timer = Timer()
This Timer
class allows us to measure the execution time of different parts of our process.
Uploading and Preparing the Image
Now, we upload our image, a Nike Low (shown in Figure 2), and prepare it for processing:
from google.colab import files uploaded = files.upload() original_image = Image.open(list(uploaded.keys())[0]) original_image.resize((512, 512)).save("examples/product.png")
We use Google Colab’s file upload feature to get our image, then resize it to 512x512
pixels and save it.
Setting Up TripoSR Parameters
We define the parameters for running TripoSR:
image_paths = "/content/TripoSR/examples/product.png" device = "cuda:0" pretrained_model_name_or_path = "stabilityai/TripoSR" chunk_size = 8192 no_remove_bg = True foreground_ratio = 0.85 output_dir = "output/" model_save_format = "obj" render = True output_dir = output_dir.strip() os.makedirs(output_dir, exist_ok=True)
These parameters define the input image path, the device to use, the pretrained model to load, and various other settings for the 3D reconstruction process.
Initializing the TripoSR Model
We initialize the TripoSR model:
timer.start("Initializing model") model = TSR.from_pretrained( pretrained_model_name_or_path, config_name="config.yaml", weight_name="model.ckpt", ) model.renderer.set_chunk_size(chunk_size) model.to(device) timer.end("Initializing model")
Here, we load the pretrained TripoSR model, set the chunk size for rendering, and move the model to the specified device (GPU or CPU).
Processing the Image
Now, we process our input image:
timer.start("Processing images") images = [] rembg_session = rembg.new_session() image = remove_background(original_image, rembg_session) image = resize_foreground(original_image, foreground_ratio) if image.mode == "RGBA": image = np.array(image).astype(np.float32) / 255.0 image = image[:, :, :3] * image[:, :, 3:4] + (1 - image[:, :, 3:4]) * 0.5 image = Image.fromarray((image * 255.0).astype(np.uint8)) image_dir = os.path.join(output_dir, str(0)) os.makedirs(image_dir, exist_ok=True) image.save(os.path.join(image_dir, "input.png")) images.append(image) timer.end("Processing images")
In this step, we remove the background from the image, resize it, and handle RGBA (red green blue alpha) images by blending the alpha channel with a gray background, as shown in Figure 3.
Generating the 3D Model and Rendering
Finally, we generate the 3D model and render it:
for i, image in enumerate(images): print(f"Running image {i + 1}/{len(images)} ...") timer.start("Running model") with torch.no_grad(): scene_codes = model([image], device=device) timer.end("Running model") if render: timer.start("Rendering") render_images = model.render(scene_codes, n_views=30, return_type="pil") for ri, render_image in enumerate(render_images[0]): render_image.save(os.path.join(output_dir, str(i), f"render_{ri:03d}.png")) save_video( render_images[0], os.path.join(output_dir, str(i), "render.mp4"), fps=30 ) timer.end("Rendering") timer.start("Exporting mesh") meshes = model.extract_mesh(scene_codes, has_vertex_color=False) mesh_file = os.path.join(output_dir, str(i), f"mesh.{model_save_format}") meshes[0].export(mesh_file) timer.end("Exporting mesh") print("Processing complete.")
This loop processes each image (in our case, just one) through the TripoSR model. It generates the 3D scene codes, renders multiple views of the 3D model, saves these renders as images and a video, and exports the 3D mesh.
Downloading the .stl File
For those looking to convert flat PNG (Portable Network Graphics) images to STL (Stereolithography), this is a great option for you to convert the .obj
object into .stl
format directly.
STL (Stereolithography) is a widely used file format for representing 3D models. It’s primarily used in 3D printing and computer-aided manufacturing (CAM).
OBJ is another common 3D model file format. It differs from STL primarily because it is vertex-based and can store more data points.
Thus, STL is a specialized format for 3D models that is optimized for 3D printing. Its simplicity, geometric focus, and wide support make it a popular choice for this application. While OBJ offers more versatility and data storage, STL is well-suited for the specific needs of 3D printing.
obj_file = "/content/TripoSR/output/0/mesh.obj" # Load the .obj mesh ms = pymesh.MeshSet() ms.load_new_mesh(obj_file) mesh = ms.current_mesh() # Convert to .stl format stl_file = 'model.stl' ms.save_current_mesh(stl_file)
We load the saved mesh output from the output directory. Now, using MeshSet()
from the pymesh
library, we load it into a new mesh object.
To save it as an .stl
file, we change the file name and use the save_current_mesh
function.
You can download the .stl
file by expanding the sidebar of the interactive Colab notebook and downloading it.
Displaying the Result
To view our result, we display the rendered video:
Video('output/0/render.mp4', embed=True)
This line displays the rendered video of our 3D model, shown as a gif
in Figure 4.
What's next? We recommend PyImageSearch University.
86 total classes • 115+ hours of on-demand code walkthrough videos • Last updated: October 2024
★★★★★ 4.84 (128 Ratings) • 16,000+ Students Enrolled
I strongly believe that if you had the right teacher you could master computer vision and deep learning.
Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?
That’s not the case.
All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.
If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.
Inside PyImageSearch University you'll find:
- ✓ 86 courses on essential computer vision, deep learning, and OpenCV topics
- ✓ 86 Certificates of Completion
- ✓ 115+ hours of on-demand video
- ✓ Brand new courses released regularly, ensuring you can keep up with state-of-the-art techniques
- ✓ Pre-configured Jupyter Notebooks in Google Colab
- ✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
- ✓ Access to centralized code repos for all 540+ tutorials on PyImageSearch
- ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.
- ✓ Access on mobile, laptop, desktop, etc.
Summary
In this tutorial, we’ve walked through the process of creating a 3D object from a single image using TripoSR. We began by setting up our environment and importing necessary libraries. We then uploaded and prepared our input image, initialized the TripoSR model, and processed the image to remove its background.
The core of our process involved using the TripoSR model to generate 3D scene codes from our 2D image. We then used these codes to render multiple views of our 3D model and export the 3D mesh.
Throughout the process, we used a custom Timer class to measure the performance of each step, giving us insights into the speed of the TripoSR model.
The result of this process is a 3D model of our input object, which we can view as a rendered video. This demonstrates the power of TripoSR to quickly and accurately create a 3D model from a single PNG image, opening up numerous possibilities in fields such as e-commerce, game development, and virtual reality.
Citation Information
Raha, R. “Create a 3D Object from Your Images with TripoSR in Python,” PyImageSearch, P. Chugh, S. Huot, and P. Thakur, eds., 2024, https://pyimg.co/g316x
@incollection{Raha_2024_create-3d-object-with-triposr-in-python, author = {Ritwik Raha}, title = {Create a 3D Object from Your Images with TripoSR in Python}, booktitle = {PyImageSearch}, editor = {Puneet Chugh and Susan Huot and Piyush Thakur}, year = {2024}, url = {https://pyimg.co/g316x}, }
To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!
Download the Source Code and FREE 17-page Resource Guide
Enter your email address below to get a .zip of the code and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL!
The post Create a 3D Object from Your Images with TripoSR in Python appeared first on PyImageSearch.
November 25, 2024 at 07:30PM
Click here for more details...
=============================
The original post is available in PyImageSearch by Ritwik Raha
this post has been published as it is through automation. Automation script brings all the top bloggers post under a single umbrella.
The purpose of this blog, Follow the top Salesforce bloggers and collect all blogs in a single place through automation.
============================
Post a Comment