How I pilot AI in my Discovery practice

Home
Briefing
Fieldwork
Analysis & Synthesis
Reports
Guardrails
Conclusion

Field notes from BirMarket - Lead UX Research at Pasha Group's BirEcosystem

Field notes from a function in motion

Three months in as Lead UX Researcher at BirMarket - Azerbaijan's largest mobile marketplace, part of Pasha Group's BirEcosystem. The research function covers three product surfaces: customer app, seller platform, and logistics tools for a large operational workforce.

I'm piloting Claude across most of the research practice: skills for discovery and research design, workshops, reusable templates — with ChatGPT for parts like desk research and feedback tagging, plus Notebook LM and a few others.

What's unpacked here is the judgement layer of AI-assisted research: where I let AI in, where I deliberately don't, and where I'm still piloting. The piloting is about scaling how I already work - holding many planes of a project at once. Most of this is mid-build. More planned than running. AI changes shape every few months. Alice ran flat out to stay in place; I'm trying to read where the field will be in six.

Concrete so far: real speed gains on desk research, quant analysis, and tagging. Less so on qual synthesis, prioritisation, conclusions, and reports. The time AI saves on mechanical work goes into scaling how I already work - a generalist holding many planes of a project at once.

Briefing - how product teams reach me

I picked up a habit on earlier projects: keeping my own brief artefacts, plus ones built with past research teams. They translate business goals and user needs into clean research briefs. The inputs were already to hand when I started building reusable skills here.

These skills point to one bigger thing: a small advisor for product teams at the idea-validation stage. Working title 'new idea discovery with user research helper'. Not ready yet.

What the advisor will do →

I gave Claude:

Templates

brief templates and response matrices, mine plus ones built with past research teams.

Methods

research methods I lean on. I rebuild the set for each role, shaped by what my team and business can work with. Product teams can use them directly.

Examples

real tasks from past projects, scoped and briefed. Formats vary by business, so I waited for first projects here to finish.

Method choice (piloting)
research-method-advisor: explains research methods and suggests which one fits the validation problem. Routes elsewhere if research isn't the right tool - to A/B testing, for example.

Qualitative

usability-task-description: scenarios → six-section brief (idea, business metrics, audience, goals, scenarios, success metrics).
usability-hypotheses: brief and prototype screens → 15-20 hypotheses grouped by screen. Includes a Nielsen heuristics pass first - hypotheses are stronger when the screens have already been read through evaluation principles. A separate ux-heuristics-review skill is in build for pre-dev design reviews.
usability-guide: moderator's guide and response matrix template.

saves time · reduces cognitive load

Quantitative (in build)
Same logic at the start as qual. Diverges later: no response matrix, since structure comes from the platform export.

quant-task-description, hypotheses, questions.

There's also a quant-analysis skill in build - covered in the Quant block below.

Open pilot decisions

Robustness

Whether the skills hold up across different project shapes.

Composition

One agent, or stay modular.

Fieldwork - transcripts and the matrix

Our developers built a custom transcription tool on Lovable, tuned to our terminology. For Azerbaijani - a language under-served by global transcription services - it's the best option I've tested. Cuts transcription time and gives automatic translation to Russian and English.
saves time

From the transcript:

Claude

turns the raw transcript into a clean quote table, attributing quotes by speaker and preserving timecodes.
saves time · reduces load · still piloting

take over part-way through, hand-finish what AI got close on. Roles to correct, timecodes to verify, the rest of the cleanup.

Video stays on my side. The user actions there are behaviour, not speech: where the participant tried, where they got stuck, the path they took. These go in the matrix as client action alongside researcher observation from the quotes. Auto-transcription doesn't shorten this part.

I run all of this in Miro. Fastest environment for my synthesis style, handles imported tables reasonably well. The bot-to-Miro connection isn't clean yet, so some manual carry-over.
still piloting

Analysis and synthesis

Qualitative

Transcript tool

Our developers built a custom transcription tool on Lovable, tuned to our terminology. For Azerbaijani - a language under-served by global transcription services.
It also gives automatic translation to Russian and English.
saves time

Claude

turns the raw transcript into a clean quote table, attributing quotes by speaker and preserving timecodes. Makes the raw clustering
saves time · reduces load · still piloting

Video stays manual. What's on screen is behaviour: a participant trying things, getting stuck, finding a path through. These go into the matrix as client action alongside researcher observation from the quotes.

Analysis. My strength here is reading the same data through several lenses at once:

the hypotheses we went in with
the strategy of the specific flow or feature
the wider business direction
which recommendation lands where, for which audience and at what altitude.

This is the spider-web part of the job. AI handles part of the clustering at the front - distributing quotes into the matrix by guide question and by hypothesis.
I worked with my researcher to push further into the synthesis itself: drafting conclusions, building slides. The progress is uneven, and no single qual-analysis skill has landed yet.

What I tried with AI on the qual side →

Quantitative

Cleaner partnership. With well-structured input, AI does the bulk of the analysis. I add strategy framing and recommendations — including how the findings cut across products.
saves time on analysis

Take the Navigation prototype test — 6 explicit phases, each with a check and a handoff. Claude carried the mechanical work (data checks, formula scaffolding, data confirmation) while I kept my time for conclusions and recommendations.
structured handoffs, time for judgement

How the structure worked, and what's next →

There's a quant-analysis skill in build: it picks up at the data export from the test platform, produces the analysis doc or table from a template, and runs a mini-template for the main conclusions I work through with Claude. Recommendations stay with me.

I'm testing whether one skill can stretch across different quant methodologies. Piloting on simple quant UTests and UX surveys.
The bottleneck for now sits on the export side. Getting clean data out of our usability platform and Figma is messy.
still piloting export path

Multi-channel feedback tagging

I'm piloting this one on ChatGPT.

The input has two languages, Azerbaijani and Russian, with a wrinkle on top: a slice of the Azerbaijani arrives in informal transliteration - non-standard Latin spellings the model has to recognise as Azerbaijani before it can translate them at all.

Why I tagged the first 300 myself. That manual pass produced what the model couldn't have built from scratch: the working tag taxonomy, and a feel for the edge cases that come up later.
The pipeline runs in five stages today: prepare input → translate meaning → apply taxonomy → multi-issue rows → QA and output.

How the teaching actually worked →

Single channel feedback skill

Same coding rules applied to new comments, month-on-month comparison built in

Channel rules

Each one with their own logic against one shared UX/UI taxonomy underneath

Insight report agent

Cross-source evidence in a single report with top tags, quotes and sources

One thing doesn't automate, even when the pipeline is fully built out. I write the recommendation with a UX-first lens. I call out the strategic or tactical decisions worth escalating, or the adjacent research that would settle the question.
reduces load at scale · still piloting

Reports

Reports is where I tried hardest to push AI into the full assembly, and where that route didn't pay off.

Template creation works with errors.

Figma outputs broken frames and unwanted pages.
Miro doesn't hold coordinates.
PPTX has slides that constrain the layout, but the editor is hostile.

Making the full report doesn't work in any tool.
There's too much text, too many entity variations.

I make sprint-review previews by hand to make sure the selected findings land with the audience.

Claude does work with shortening the full report for a specific audience like for C-Level decks, but without changing the materials too much. Figma translator is the most reliable bit.

The full-template route asks for both at once - content shaping and assembly - and there a researcher with a template and an hour is better than the toolchain.

Guardrails

This work runs on a small set of standing rules. Mine map onto the NIST AI RMF principles.

Personal client data doesn't reach any model. I anonymise sensitive material before it gets near the chat. The model doesn't act on anything I haven't approved. Tools run inside scoped folders, never against the wider file system. Anything new starts with a small batch before it scales.

Stretching the web

Three months in, with rounds of love and frustration on both sides. The AHA moments cut both ways. AI saves less time than the hype suggests.
The scale that matters is reaching the same mind further into the work:

Depth

more conclusions from one study

Breadth

delivered into more products in parallel

Connection

faster link to research already in the system

→ more product changes and more influence, same working hours.
I still need to build the knowledge base — a structured place for past research to live. The connecting work happens in my head until then.

This doesn't come together on my side alone. Teams that move faster build different habits around shared work — iterating in the open, with artefacts that belong to the team.

I'm a generalist with a spider-web mind. I hold many planes of a project at once: strategy, method, tactics, audience. I do the connecting across them. AI pulls the repeatable parts off my plate so the judgement parts have room.

More cases

Service Design at Azerbaijan's Leading Digital Wallet

I worked as Service Designer & Product Researcher at m10 digital wallet and part of BirEcosystem. Comparable growth trajectory to Wise at Series B. The work spanned the digital card, loans, cross-border transfers, loyalty programmes, and a few other areas. Much of it was coordination across product, analytics, marketing, growth, and CX teams, anchored to the user.

UX Research across 2 retail platforms

As Head of UX Research at X5 Digital, I led product discovery for a digital department with 20M MAU grocery retailer with retain chains. Scale comparable to Tesco and Sainsbury's combined running on one shared tech platform.

Building a B2B Research Lab for a Bank

As B2B Research Lead, I built the research function and governance model for a private bank serving 1M+ business clients and 120+ product teams, co-creating a research lab. I transformed research from ad-hoc studies into a structured system embedded in product decision-making.

Competency assessment framework for CX/UX-researchers

Kit for Researchers, Team Leaders, and Business Managers by Elena Svergunenko and Anna Pilyutik