✦ 3D Geometric Transformation Visualizer ✦

Abstract

The 3D Geometric Transformation Visualizer is an interactive AR/VR platform that bridges the gap between abstract linear algebra and intuitive spatial understanding. Built using Three.js, MediaPipe, and AR.js, the system renders five Platonic solids in real time and exposes their underlying transformation matrices — translation, rotation, and scaling — through multiple interaction paradigms. Users can manipulate geometry through keyboard input, hand gestures, device gimbal control, or augmented reality projection onto a physical Hiro marker. The platform visualizes coordinate space transitions across model, world, view, and clip space, making the graphics pipeline tangible rather than theoretical. Developed iteratively over six weeks by Team Rankers, the project demonstrates that immersive, browser-based tooling can make mathematical abstraction genuinely accessible without requiring specialized hardware or software installation.

Problem Statement

Traditional instruction in 3D geometry and linear algebra relies on static diagrams and symbolic notation. Students encounter transformation matrices as grids of numbers with no visceral connection to what those numbers do in space. Existing visualization tools are either too simplified — rotating a single shape with no matrix feedback — or too specialized, requiring desktop software like MATLAB or Blender. Browser-based AR/VR tooling capable of combining real-time rendering, gesture input, and coordinate space visualization in a single lightweight package was absent. This project was started to fill that gap: to create a tool that lets a student physically manipulate a geometric solid and simultaneously observe the transformation matrix update in real time, making the abstract concrete through direct interaction.

Methodologies

Three.js was chosen as the rendering backbone for its maturity, cross-browser support, and built-in geometry primitives. MediaPipe Hands provided the hand landmark detection pipeline for gesture recognition, running entirely client-side via WebAssembly. AR.js with Hiro marker support was integrated for augmented reality projection, overlaying the 3D scene onto a physical marker detected through the device camera. The device orientation API powered the gimbal mode, mapping physical phone tilt directly to scene rotation. All four modes share a unified transformation state — a single matrix stack — so switching between input methods preserves the current geometric state. The frontend is entirely static, requiring no backend, making deployment to Netlify trivial and keeping the barrier to access as low as possible.

Experimentation & Constraints

Gesture control via MediaPipe introduced 80–120ms latency on mid-range devices and required consistent lighting; low-contrast backgrounds caused frequent landmark dropout. Gimbal mode using the device orientation API suffered from gyroscope drift on budget Android devices, requiring a manual recalibration button. Hiro AR detection was highly sensitive to marker print quality and ambient lighting — glossy paper caused reflection artifacts that broke tracking entirely. Keyboard control was the most reliable but excluded mobile users and felt disconnected from the spatial nature of the content. Performance on mobile browsers was a persistent constraint across all modes, with the four-viewport Three.js render loop dropping below 30fps on devices older than two years, requiring a reduced-geometry fallback mode to maintain usability.

Results & Comparisons

Keyboard mode delivered the most stable frame rate (consistent 60fps on desktop) and lowest interaction latency, but scored lowest on engagement and was inaccessible on touch-only devices. Gesture mode was the most visually compelling and drew the strongest reactions during demonstrations, but required a webcam, good lighting, and a capable CPU — making it unsuitable as the primary mode on mobile. Gimbal mode struck the best balance: it worked on any modern smartphone without additional hardware, felt intuitive within seconds, and maintained 45–60fps on mid-range devices. The main drawback was drift over extended sessions. Hiro AR produced the most striking visual output — seeing a rotating icosahedron projected onto a desk in real space — but was the most fragile, dependent on print quality, lighting, and camera focus. In controlled conditions it worked reliably; in variable environments it was the least dependable. Overall, gimbal mode is recommended as the default entry point, with gesture mode as the showcase interaction and AR as the high-impact demonstration piece. Keyboard remains the fallback for desktop-only contexts.

Feature	Ease of Use	Performance	Mobile	Key Constraint
2D Floor Planner	✓	✓	✓	Coverage limited to fence boundary
3D Orbit View	✓	✓	Medium	GLB load time on slow connections
Quad Viewports	✓	Medium	✗	Render cost on low-end GPUs
Gesture Mode	Medium	Medium	✓	Webcam + lighting dependent

Observations: The 2D planner performed consistently across all tested devices. Quad viewports showed minor frame drops on integrated graphics but remained usable. Gesture mode worked reliably under good lighting with a fixed webcam; performance degraded noticeably in low-light or cluttered backgrounds. The 3D orbit view loaded within 2–3 seconds on average connections, with the GLB assets being the primary bottleneck.

Demonstration

Conclusion & Future Scope

Conclusion

The project successfully demonstrates that browser-native AR/VR tooling can make 3D geometric transformation genuinely interactive and pedagogically meaningful. Four distinct input paradigms were built, tested, and compared within a unified rendering environment, validating the hypothesis that physical interaction accelerates intuitive understanding of abstract mathematical concepts.

Near-Term Future

Performance optimizations for the four-viewport render loop, improved gesture debouncing to reduce false positives, broader Platonic solid support with user-uploadable custom meshes, and smoother gimbal recalibration UX are the immediate priorities.

Distant Future

Object recognition to extract real-world geometry as transformation inputs, LLM-guided walkthroughs that explain matrix operations in plain language as the user interacts, and multi-user collaborative AR sessions where multiple devices share a synchronized transformation state.

The Journey

What began as a geometric transformation visualizer evolved into a practical security camera planning tool. The rendering pipeline built for rotating Platonic solids turned out to be exactly what was needed to visualize camera coverage cones, blind spots, and field-of-view in real space.

The project grew from a basic 2D floor plan with click-to-place cameras into a full dual-mode tool supporting:

2D floor plan with live coverage cone and blind spot visualization
3D house model loaded from GLB with real-time camera placement
Quad viewport system — Perspective, Top, Front and Side simultaneously
Gesture-based orbit and zoom via MediaPipe hand tracking
Coverage statistics updated live as cameras are added or adjusted
Minimap for orientation and quick 2D/3D switching

Progress Snapshot

2D Floor Planner✓ 100%

3D House Model & Viewports✓ 100%

Quad Viewport Sync✓ 100%

Gesture Control✓ 100%

Coverage & Blind Spot Analysis✓ 100%

Minimap & 2D/3D Navigation✓ 100%