Alkemist
  • Work
  • About
  • Journal
  • Contact
February 1, 2026Experiments

Teaching a model to see like a stylist

Notes from building Bomagi — an AI staging tool for Nordic homes. What we got wrong first, what surprised us, and why the hardest part had nothing to do with the model.

A living room mid-transformation — before and after AI staging.

When we started building Bomagi, we thought the hard part would be the model. It wasn't. The hard part was understanding what 'Nordic interior style' actually means — not as an aesthetic mood board, but as a set of precise, learnable visual signals a generative model could act on.

The use case sounds simple: take a photo of an empty or poorly staged room and return a photorealistic version of what it could look like, furnished. Real estate agents send us the before. We send them the after.

Simple to describe. Much harder to build well. Over the past four months we have learned more about interior design, generative AI, and the Norwegian real estate market than we ever expected. This is a record of the technical and design decisions that shaped the product — and the mistakes that shaped us.

Side-by-side: raw empty room versus AI-staged version.
Before and after. The gap is the product.

What broke first

Our first approach was prompt engineering on top of a general image model. The results were fine. Fine as in: technically not wrong, aesthetically not right. The model kept defaulting to a kind of mid-century American staging — warm woods, greenery, slightly oversaturated. It looked nothing like a Bergen apartment.

We needed to be more specific. Not just 'Scandinavian' — that's a Pinterest category, not a prompt. We started breaking it down: light temperature, material palette, ceiling height conventions, the particular way Norwegian homes mix old and new without irony.

We catalogued over 200 real estate listings from Finn.no, annotated the furniture styles, color temperatures, textile choices, and spatial arrangements that appeared most frequently in successful sales. Patterns emerged quickly: light floors, minimal window treatments, one statement piece per room, never more than three materials visible in a single sightline.

The real breakthrough was negative examples. We built a dataset of "almost right" images — stagings that looked Scandinavian to an outsider but wrong to anyone who has actually lived in Oslo or Bergen. Too much color. Too many textures. Furniture too large for the room. These anti-examples trained our classifier more effectively than the positive ones.

The prompt that started working: Furnished Norwegian apartment interior. Natural north light. Muted palette: white walls, pale oak, linen and wool textiles. No visible technology. Quiet, functional, slightly sparse. Not a showroom — lived in.

Building the visual language

Once we had a working prompt template, we needed to codify the visual rules into something more systematic. We created what we call a "style manifest" — a structured document that describes the target aesthetic in machine-readable terms. Not prose descriptions, but specific constraints: color temperature range (4200K-5500K), maximum saturation per channel, permitted material categories, spatial density targets.

This manifest became the backbone of our validation pipeline. Every generated image passes through a series of checks: Does the color palette fall within the defined range? Are the proportions of furniture appropriate for the room dimensions? Is the light direction consistent with the window placement in the original photo? Each check has a tolerance threshold, and the image only ships if all checks pass.

The early versions of this pipeline rejected about 60% of generated images. Today, after months of refinement to both the prompts and the validation rules, the acceptance rate is above 70%. Our target is 85% — the remaining 15% will always be edge cases where the room itself presents unusual challenges (extreme angles, unusual lighting, non-standard room shapes).

Evolution of our staging output

Early output — too saturated
Week 2: Over-saturated, American influence
Mid-development output
Week 8: Getting closer, but furniture too large
Current output quality
Week 14: Confident Nordic staging

The architecture decision

We evaluated three approaches: fine-tuning a base model on a curated Nordic interiors dataset, prompt-engineering on top of a commercial API, and a hybrid that uses a commercial model for generation but a custom classifier to validate outputs against a Nordic reference set before returning them to the user.

We chose the hybrid. Fine-tuning was too expensive for the current stage and would need constant retraining as model quality improved. Pure prompt engineering wasn't consistent enough. The classifier approach let us ship something real without betting the product on a dataset we didn't have yet.

The classifier runs as an Edge Function on Vercel. It receives the generated image, runs it through a lightweight model that scores it on five dimensions (color fidelity, spatial plausibility, material accuracy, lighting consistency, and overall Nordic-ness), and returns a pass/fail with a confidence score. If it fails, we regenerate with adjusted parameters. The whole loop adds about 4 seconds to the user-facing latency — acceptable for a product where the alternative is days of manual work.

Generation time
~18s
Per image, Gemini 2.5 Flash
Acceptance rate
73%
Images passing Nordic classifier without regeneration
Cost per image
~0.08€
At current API pricing, before volume discounts

The user experience problem

Technical quality was one thing. Making the product feel right was another. Real estate agents are busy, skeptical, and not particularly interested in AI. They want results, not technology. Our first UI was too techy — progress bars, model names, generation parameters. It felt like a developer tool, not a real estate tool.

The breakthrough was simplicity. Upload a photo. Wait 30 seconds. Get three variations back. Pick one. Download. Done. No settings, no sliders, no AI jargon. The complexity lives in the pipeline; the interface is almost aggressively simple.

We tested with twelve agents across Oslo, Bergen, and Trondheim. The feedback was surprisingly consistent: they cared about three things — speed, realism, and the ability to stage specific rooms for specific price points. A 2-million-kroner apartment in Grinerløkka should not be staged like an 8-million-kroner apartment in Frogner. Context matters, and the model needs to understand socioeconomic design signals.

What we didn't expect

The biggest surprise was how much the room's original light conditions mattered. Feed the model a photo taken in flat overcast light and it would produce a flat, overcast staging. Feed it the same room in the golden hour and the output was warmer, more alive — even when the prompt said nothing about lighting.

This was a feature, not a bug. It meant we could give agents a simple tip — photograph at 10am, north-facing rooms first — and improve output quality without touching the model at all. The best product decisions sometimes live entirely outside the product.

The second surprise was scale resistance. Our pipeline works beautifully for single rooms. But when an agent uploads an entire apartment — eight rooms, three angles each — the consistency between rooms degrades. The living room might get light oak floors while the bedroom gets walnut. The kitchen palette might clash with the hallway. We are still solving this problem, and it is harder than it sounds.

Aerial view of Bergen neighborhoods
Bergen from above — understanding local context means understanding local light, local materials, local taste.

Lessons for building AI products

Building Bomagi taught us several lessons that generalize beyond real estate staging. First, domain knowledge is the moat, not model access. Anyone can call the same API we do. What they cannot easily replicate is our understanding of what Nordic staging actually looks like. This understanding is embedded in our prompts, our classifier, our validation rules, and our feedback loops with agents.

Second, the product is the pipeline, not the model. Models improve every few months. If your product is tightly coupled to a specific model's capabilities, you are building on sand. If your product is a well-designed pipeline that can swap models in and out, you benefit from every improvement in the ecosystem without rebuilding.

Third, talk to users before writing code. Our first three months would have been significantly more productive if we had spent the first two weeks interviewing real estate agents instead of experimenting with model parameters. The constraints they care about — speed, price-point appropriateness, room-to-room consistency — shaped the architecture more than any technical consideration.

“The best product decisions sometimes live entirely outside the product.”

— Isak Gundrosen

Bomagi is currently in private beta with a handful of Nordic real estate agencies. If you're in property and want early access, reach out at kontakt@alkemist.no.

Bomagi

AI-powered interior staging for Nordic real estate. Upload a room, get a staged version back in under 30 seconds.

Private betaAIReal estateNordic

Keep reading

See all
#Decriminalize campaign

Identity for a cause that needed to be taken seriously

ShowcaseFeb 10, 2026
Alkemist HQ from above — where ideas take flight.

The Future of Creative Tools

EssaysJan 15, 2026
Nordic AI Meet brand identity

Gathering Young AI Researchers

ShowcaseNov 5, 2024