Spatial reasoning has emerged as a critical capability in the evolution of large language models (LLMs). While early systems excelled at text generation, their ability to interpret and reason over geometric constraints remained limited. Through the development of GardenDesigner.ai built by Myceron s.r.o. and Coral Consulting GmbH we explored how modern LLMs perform when applied to a real-world spatial planning problem: residential garden design.

Why gardens are a real spatial benchmark

GardenDesigner.ai is interesting because it does not start from a mood board or a single photo. It starts from explicit geometry: users draw property boundaries on a satellite map, mark buildings and structures, choose style and features, and the product aims to return complete multi-zone layouts, plant suggestions, budgets, and visuals per zone. Users define property boundaries, house footprints, and fixed elements such as pools, patios, trees, and raised beds. This structured geometric input is combined with semantic context: intended usage, style preferences (e.g., modern or romantic), and free-form notes. The model must reconcile hard spatial constraints with soft semantic intent. That makes residential garden design a symbolic spatial-planning task first and an image task second. A description of the design process can be found here.

What we learned about the design process

In practice, the important step is not “ask for a garden,” but “serialize the planning process.” Once lot lines, house footprint, pool, patio, trees, and raised beds are normalized into explicit constraints, the model can reason in stages: parse the site, infer circulation and adjacencies, decompose the plot into zones, then generate zone descriptions and imagery. That design mirrors published research: LayoutGPT showed that separating layout planning from image generation improved numerical and spatial correctness by 20–40% over direct text-to-image generation. LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

Why the Gemini stack worked better

We initially deployed OpenAI models but observed limitations in spatial consistency and constraint adherence. Layout errors, overlapping features, and inconsistent zoning were common.

A transition to Google’s Gemini family significantly improved performance. Internal benchmarking within our application showed:

  • GPT-4/5 class models: medium spatial reasoning, low reliability in sitemap generation
  • Gemini 3.1 Flash/Pro: high spatial consistency and reliable zone decomposition, very high reasoning accuracy for complex layouts
  • Nano Banana 2 / Pro: strong multimodal capabilities and best outcome for garden siteplans

The closest public proxy to our zoning and design task is SpatialBench, where Gemini-2.5-pro leads at 75.79 overall and 74.11 on planning, versus GPT-4o-mini at 30.92 and 27.68, and GPT-5-chat-latest at 22.45 and 24.07. PlanQA reaches a similar conclusion from structured floorplans: current models are solid on distances and line-of-sight, but topology, planning, and constraint verification still fall below 50% accuracy on harder layout tasks. That matches our production experience: earlier OpenAI-era pipelines produced fluent garden language, but not reliably believable site plans; the newer Gemini reasoning stack finally crossed the threshold for usable zoning. SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition

Why images still lag the site map

Despite advances in reasoning, all models exhibit deficits in translating spatial plans into images. Common issues include:

  • Loss of scale and proportion
  • Inaccurate spatial relationships
  • Weak alignment between textual descriptions and visuals

On Google’s GenAI-Bench, Gemini 3.1 Flash Image scores 1073 on overall preference and 1074 on infographic factuality, up from Nano Banana’s 942 and 881; Nano Banana Pro leads object/environment editing at 1042. But Google also documents occasional left/right localization errors and limits in 3D reasoning.

We have observed that current LLMs can reason about space symbolically, but image generation models struggle to preserve that structure visually in some cases. But even with those limitations in place, we found that GardenDesigner.ai can produce acceptable image quality when focusing the image generation on zone details and when using customer provided images as reference.

Summary

Spatial reasoning in LLMs has matured significantly, but the gap between symbolic planning and visual output remains real. Our experience building GardenDesigner.ai shows that the key is not to treat garden design as a single generative task, but to decompose it into structured stages — geometry parsing, zone planning, and image generation — each handled by the model best suited to it. The Gemini stack currently leads on constraint adherence and spatial consistency for complex layouts. Image generation is improving but still requires careful scoping to deliver reliable results.

The staged, decomposed approach we applied here is not unique to AI — it mirrors the execution discipline required in any complex technology programme. The patterns that cause AI pipelines to fail (unclear ownership of stages, no feedback loops, no measurable success criteria) are the same ones that sink transformation programmes. We explore those failure modes in detail in Why Most Transformations Fail — And How to Avoid It.

The Coral Consulting Perspective

AI adoption is a delivery problem as much as a technology problem. Selecting the right model stack matters — but so does the discipline to decompose the problem correctly, define measurable outcomes, and iterate based on real-world results rather than benchmark scores alone.

At Coral Consulting, we help organisations move from AI experimentation to embedded capability — combining technology selection, solution design, and delivery governance into a single integrated approach. The vendor choice question extends beyond technical performance: which model vendor do you want to depend on strategically, and what are the long-term implications of that dependency? We explore that dimension in Strategic Technology Is Never Free. If you are evaluating AI tools or building an AI-enabled product, we would be glad to talk. You can also explore our AI & Data Transformation services.

Further reading on GardenDesigner.ai