【3D Panorama Scan】Matterport Digital Twin: The Best Solution for Home Renovation

【3D Spatial Scanning】Matterport Digital Twin: The Best Solution for Home Renovation


Matterport can transform any room in your home into a renovated space without moving a single piece of furniture.

Imagine completely redecorating your living room without moving any furniture—that's what Matterport is building.

Based on this idea, Matterport applies AI to explore how advanced techniques in 3D semantic understanding and inpainting can bring a host of exciting new applications to digital twins.

Matterport initially focused on creating realistic but static reconstructions of real-world spaces, laying a solid foundation for virtual tours and various consumer applications. However, to truly transform these spaces, assess their potential uses, or manage their daily maintenance and operations, static reconstruction alone is insufficient. To this end, the company has been developing advanced property intelligence tools that leverage semantic understanding to provide deeper insights and valuable information about properties.

Now, with the latest breakthroughs in AI generative technology, the company is expanding its focus to creating new content and experiences within Matterport spaces to enrich how users interact with and perceive these digital environments.

Combining Matterport's decade of machine learning and AI experience with the power of new generative AI tools, they are bringing new design and furniture ideas to life through Project Genesis, all with the click of a button—starting with the ability to instantly renovate any space.

 

What is furniture removal?

Defurnishing is a key technique in digital image processing and 3D modeling that involves removing furniture and movable objects from a spatial image to make the space empty.

This approach is crucial for applications requiring visualization of empty spaces, including interior design, real estate, and virtual staging, providing a clear display of the space's potential.

Defurnishing is a feature under development for all Matterport digital twins, and it consists of three steps:

1. Reconstruction: First, the space is captured and reconstructed to create a digital twin.

2. Understanding: The reconstructed space then undergoes semantic understanding, specifically identifying the pixels (in images) and mesh faces (in dollhouse view) belonging to furniture items intended for removal.

3. Synthesis: Since areas obscured by furniture are never directly captured, removing furniture creates blank pixels in images and holes in the mesh. The "blank space" content in images needs to be in-painted, while holes in the mesh need to be filled and textured.

 

In 3DMart's article on Matterport's 2024 Winter Release, you can preview the company's defurnishing feature. This part of the blog series will highlight semantic segmentation—the critical first step in automated defurnishing.
Below is the Chinese version of the Matterport Winter Release video:  

 


Understanding Semantic Segmentation


Semantic segmentation is a crucial computer vision task that involves dividing an image into distinct regions and assigning a specific category to each. The goal is to label every pixel with a category (such as "floor," "wall," "window," "table"), facilitating a comprehensive understanding of the scene by precisely locating objects and delineating their boundaries.

Unlike object detection, which focuses on objects with bounding boxes, and image classification, which applies a single label to an entire image, semantic segmentation enables fine-grained analysis of a scene, enhancing the depth of interpretation. Semantic segmentation is a fundamental technology in computer vision, with applications in autonomous driving, medical imaging, robotics, and other fields.

Recently, it has become a key element in virtual interior design. During the initial capture of a space, the available primary data outlines the overall structure and aesthetics of the space. Semantic segmentation plays a vital role in enriching the understanding of Matterport space content, enabling precise manipulation—whether moving, editing, indexing, or deleting elements.

To effectively change any aspect of a Matterport space, detailed semantic segmentation that distinguishes the key components of the space is essential.


The Role of Segmentation in Defurnishing

To remove furniture from the images and 3D structure of a digital twin, the pixels/mesh faces belonging to furniture items must first be identified. Removing these pixels/faces often results in missing information. This is because the areas behind/underneath the furniture are not visible when the digital twin is captured.

Therefore, after removing the furniture, reliable image/3D content needs to be generated to fill these gaps. This process is called "image inpainting."

Inpainting is an advanced technique used for image editing and restoration, aimed at filling in missing or corrupted parts of an image, ensuring it appears complete and natural. Its main purpose is to seamlessly reconstruct these areas so that they blend perfectly with the surrounding image, thereby maintaining the image's structural integrity and visual continuity.

Many inpainting methods rely on precise segmentation masks for the regions designated for removal and subsequent inpainting. Any discrepancies or artifacts affecting the furniture segmentation mask can significantly impact the inpainting results, for example:

• Removing parts of the building structure instead of furniture can lead to severe structural hallucinations (e.g., you might end up creating a doorway to a non-existent room instead of inpainting some floor and wall content).

• Incorrect furniture segmentation, meaning not correctly masking parts of objects, can lead to the unintentional inpainting of false objects instead of the desired "empty space" (which is typically understood as walls and floors, depending on the viewpoint).

• False negatives occur when actual furniture is not segmented, resulting in remnants of furniture appearing in the final result.

Therefore, ensuring accurate semantic segmentation is crucial for achieving high-quality defurnishing results.



Matterport's Semantic Segmentation Approach

1. Data
Matterport uses equirectangular projection for semantic segmentation of 360-degree panoramic images to capture the widest possible visual context in a single frame. Context plays a crucial role in computer vision tasks, especially when using modern neural network frameworks like Vision Transformers.

2. Custom Ontology
Initially, the company used a subset of the ADE20k ontology, which included 150 categories commonly found in built environments. However, this approach did not fully meet specific needs.

In Matterport plans, the goal is to remove all removable furniture while preserving built-in fixtures. Public datasets often group these different types of furniture into general categories (e.g., classifying both freestanding and built-in wardrobes simply as "wardrobe").

Therefore, to meet specific needs, several other task-specific factors had to be considered, and a custom dataset with furniture segmentation annotations was compiled.

3. Network Architecture
Matterport decided to leverage the capabilities of the Vision Transformer architecture, which has been successfully used in various AI applications within the project, specifically choosing the Vision Transformer Adapter as the basis for segmentation experiments. This model modifies the Vision Transformer, originally designed to generate a single feature vector from image input, to handle image-to-image tasks requiring feature maps rather than single vectors.

Although ViT-Adapter was not specifically trained for 360-degree equirectangular images, it has demonstrated impressive performance in handling this data type, even though it was not originally designed to address the aforementioned ontological differences.

4. Deployment
Recently, Matterport elevated semantic segmentation, along with depth estimation, to a primary position in the pipeline, so now every captured image undergoes semantic segmentation. As a result, the inference runs in the cloud, resisting sudden traffic fluctuations, simplifying maintenance, and enabling smoother updates.

5. 3D Semantic Understanding
Matterport has a unique advantage in 3D spatial semantic understanding. By integrating 3D context into semantic segmentation, a deeper understanding of the spatial and semantic connections within any captured space can be achieved. The company innovatively uses a 3D dollhouse view, combining perspectives from multiple angles, significantly improving prediction accuracy. This advanced approach enables more accurate and meaningful modifications.

A typical example is the defurnishing scenario, which requires a complex and accurate understanding of the environment's 2D and 3D features.



Technical Challenges and Limitations of Defurnishing


Even the most advanced semantic segmentation models are not perfect and struggle to generalize effectively to new, unseen data. This reality requires Matterport to develop strategies to correct errors or create workarounds.

While supervised semantic segmentation methods generally yield the best results, the task of defining and managing ontologies poses significant challenges. These ontologies easily shift and change depending on the specific application, necessitating frequent data annotation when major adjustments are made. Therefore, the more a model can be trained in a self-supervised manner, the less time, effort, and financial resources are needed to adapt segmentation models to new ontologies, which face numerous challenges in their design. For example, in furniture removal, Matterport's goal is to remove "standalone" furniture while retaining "built-in" fixtures.

Determining when a piece of furniture qualifies as "built-in" is a complex task, often requiring a comprehensive set of rules to ensure consistency and repeatability in decisions. Without a clear set of guidelines, data labeling efforts are likely to produce low-quality results, which in turn affects the performance of the segmentation model.

Looking Ahead

Self-supervised Learning
Matterport has been exploring self-supervised learning for some time, and with the successful launch of various image-based models, now is an ideal time to deepen investment in this area.

Self-supervised learning offers significant advantages, such as minimizing the need for annotated data, accelerating the training process, and improving performance for specific tasks.

Integrating 3D Context
Exploring the integration of 3D context into the workflow offers a promising path for advancing the process. Currently, Matterport's data aggregation method is passive, relying on a heuristic approach to weight features projected from multiple views. By investigating methods to integrate 3D context during the training phase, there is an opportunity to develop viewpoint-independent features, thereby enhancing the model's understanding capabilities.

Furthermore, the company is also exploring the potential of end-to-end 3D technology to see if directly processing semantic understanding through 3D representations can improve outcomes. This includes re-evaluating reconstruction methods. Adopting cutting-edge technologies like Neural Radiance Fields (NeRFs) or other innovative strategies could fundamentally change current practices, leading to significant improvements in model understanding and performance.

Multitask Models
The idea of multitask models, capable of performing multiple tasks simultaneously, has gained considerable attention. However, these models need to be maintained as a cohesive system, making a strategy of adopting shared backbones across multiple models more appealing.

As the company advances, striking the right balance between the advantages and complexities of multitask models will be crucial for improving workflows and outcomes.

Open-Vocabulary Models

Another exciting area of development is open-vocabulary models. Traditional models, constrained by fixed ontologies, are limited by the broad range of customer needs.

However, open-vocabulary models break free from these limitations, enabling them to identify a wider range of objects and concepts without being restricted by predefined categories.

This adaptability is invaluable for Matterport, allowing for a broader semantic understanding across various spaces and applications. Adopting an open-vocabulary approach is expected to significantly enhance the ability to meet diverse customer needs and improve the interoperability of our assets with other tools.

Conclusion
Expanding semantic understanding of spaces will unlock a range of applications across multiple industries. Recognizing that a single ontology cannot meet all customer needs, Matterport sees value in open-vocabulary techniques and other methods not bound by strict ontological frameworks.

Another goal is to improve the compatibility of resources with various tools. To this end, we are developing multiple integrations that ensure the final rendered empty space is accurate and visually coherent.

Related Products

Matterport PRO3 is a professional 3D panoramic/spatial scanner with a high quality of 134 megapixels. Coupled with Matterport Capture indoor environment 3D scanning software, it can quickly 3D scan spaces of various sizes with just one click, instantly generating high-precision 2D floor plans and 3D virtual spaces!

 

Want to learn more about Matterport products? Contact us below!

- Contact Us -


3DMart offers more than just 3D printing; we provide three major OEM services: "3D printing services," "3D scanning services," and "3D spatial scanning services"!!

Follow our fan pages for the latest news:
Facebook | Instagram | LinkedIn