Teaching a model to see like a stylist
Notes from building Bomagi — an AI staging tool for Nordic homes. What we got wrong first, what surprised us, and why the hardest part had nothing to do with the model.

When we started building Bomagi, we thought the hard part would be the model. It wasn't. The hard part was understanding what 'Nordic interior style' actually means — not as an aesthetic mood board, but as a set of precise, learnable visual signals a generative model could act on.
The use case sounds simple: take a photo of an empty or poorly staged room and return a photorealistic version of what it could look like, furnished. Real estate agents send us the before. We send them the after.
Simple to describe. Much harder to build well. Over the past four months we have learned more about interior design, generative AI, and the Norwegian real estate market than we ever expected. This is a record of the technical and design decisions that shaped the product — and the mistakes that shaped us.

What broke first
Our first approach was prompt engineering on top of a general image model. The results were fine. Fine as in: technically not wrong, aesthetically not right. The model kept defaulting to a kind of mid-century American staging — warm woods, greenery, slightly oversaturated. It looked nothing like a Bergen apartment.
We needed to be more specific. Not just 'Scandinavian' — that's a Pinterest category, not a prompt. We started breaking it down: light temperature, material palette, ceiling height conventions, the particular way Norwegian homes mix old and new without irony.
We catalogued over 200 real estate listings from Finn.no, annotated the furniture styles, color temperatures, textile choices, and spatial arrangements that appeared most frequently in successful sales. Patterns emerged quickly: light floors, minimal window treatments, one statement piece per room, never more than three materials visible in a single sightline.
The real breakthrough was negative examples. We built a dataset of "almost right" images — stagings that looked Scandinavian to an outsider but wrong to anyone who has actually lived in Oslo or Bergen. Too much color. Too many textures. Furniture too large for the room. These anti-examples trained our classifier more effectively than the positive ones.
Building the visual language
Once we had a working prompt template, we needed to codify the visual rules into something more systematic. We created what we call a "style manifest" — a structured document that describes the target aesthetic in machine-readable terms. Not prose descriptions, but specific constraints: color temperature range (4200K-5500K), maximum saturation per channel, permitted material categories, spatial density targets.
This manifest became the backbone of our validation pipeline. Every generated image passes through a series of checks: Does the color palette fall within the defined range? Are the proportions of furniture appropriate for the room dimensions? Is the light direction consistent with the window placement in the original photo? Each check has a tolerance threshold, and the image only ships if all checks pass.
The early versions of this pipeline rejected about 60% of generated images. Today, after months of refinement to both the prompts and the validation rules, the acceptance rate is above 70%. Our target is 85% — the remaining 15% will always be edge cases where the room itself presents unusual challenges (extreme angles, unusual lighting, non-standard room shapes).
Evolution of our staging output



The architecture decision
We evaluated three approaches: fine-tuning a base model on a curated Nordic interiors dataset, prompt-engineering on top of a commercial API, and a hybrid that uses a commercial model for generation but a custom classifier to validate outputs against a Nordic reference set before returning them to the user.
We chose the hybrid. Fine-tuning was too expensive for the current stage and would need constant retraining as model quality improved. Pure prompt engineering wasn't consistent enough. The classifier approach let us ship something real without betting the product on a dataset we didn't have yet.
The classifier runs as an Edge Function on Vercel. It receives the generated image, runs it through a lightweight model that scores it on five dimensions (color fidelity, spatial plausibility, material accuracy, lighting consistency, and overall Nordic-ness), and returns a pass/fail with a confidence score. If it fails, we regenerate with adjusted parameters. The whole loop adds about 4 seconds to the user-facing latency — acceptable for a product where the alternative is days of manual work.
The user experience problem
Technical quality was one thing. Making the product feel right was another. Real estate agents are busy, skeptical, and not particularly interested in AI. They want results, not technology. Our first UI was too techy — progress bars, model names, generation parameters. It felt like a developer tool, not a real estate tool.
The breakthrough was simplicity. Upload a photo. Wait 30 seconds. Get three variations back. Pick one. Download. Done. No settings, no sliders, no AI jargon. The complexity lives in the pipeline; the interface is almost aggressively simple.
We tested with twelve agents across Oslo, Bergen, and Trondheim. The feedback was surprisingly consistent: they cared about three things — speed, realism, and the ability to stage specific rooms for specific price points. A 2-million-kroner apartment in Grinerløkka should not be staged like an 8-million-kroner apartment in Frogner. Context matters, and the model needs to understand socioeconomic design signals.
What we didn't expect
The biggest surprise was how much the room's original light conditions mattered. Feed the model a photo taken in flat overcast light and it would produce a flat, overcast staging. Feed it the same room in the golden hour and the output was warmer, more alive — even when the prompt said nothing about lighting.
This was a feature, not a bug. It meant we could give agents a simple tip — photograph at 10am, north-facing rooms first — and improve output quality without touching the model at all. The best product decisions sometimes live entirely outside the product.
The second surprise was scale resistance. Our pipeline works beautifully for single rooms. But when an agent uploads an entire apartment — eight rooms, three angles each — the consistency between rooms degrades. The living room might get light oak floors while the bedroom gets walnut. The kitchen palette might clash with the hallway. We are still solving this problem, and it is harder than it sounds.

Lessons for building AI products
Building Bomagi taught us several lessons that generalize beyond real estate staging. First, domain knowledge is the moat, not model access. Anyone can call the same API we do. What they cannot easily replicate is our understanding of what Nordic staging actually looks like. This understanding is embedded in our prompts, our classifier, our validation rules, and our feedback loops with agents.
Second, the product is the pipeline, not the model. Models improve every few months. If your product is tightly coupled to a specific model's capabilities, you are building on sand. If your product is a well-designed pipeline that can swap models in and out, you benefit from every improvement in the ecosystem without rebuilding.
Third, talk to users before writing code. Our first three months would have been significantly more productive if we had spent the first two weeks interviewing real estate agents instead of experimenting with model parameters. The constraints they care about — speed, price-point appropriateness, room-to-room consistency — shaped the architecture more than any technical consideration.
“The best product decisions sometimes live entirely outside the product.”

