How I pilot AI in my Discovery practice
Field notes from BirMarket - Lead UX Research at Pasha Group's BirEcosystem
Field notes from a function in motion
Three months in as Lead UX Researcher at BirMarket - Azerbaijan's largest mobile marketplace, part of Pasha Group's BirEcosystem. The research function covers three product surfaces: customer app, seller platform, and logistics tools for a large operational workforce.

I'm piloting Claude across most of the research practice: skills for discovery and research design, workshops, reusable templates — with ChatGPT for parts like desk research and feedback tagging, plus Notebook LM and a few others.

What's unpacked here is the judgement layer of AI-assisted research: where I let AI in, where I deliberately don't, and where I'm still piloting. The piloting is about scaling how I already work - holding many planes of a project at once. Most of this is mid-build. More planned than running. AI changes shape every few months. Alice ran flat out to stay in place; I'm trying to read where the field will be in six.

Concrete so far: real speed gains on desk research, quant analysis, and tagging. Less so on qual synthesis, prioritisation, conclusions, and reports. The time AI saves on mechanical work goes into scaling how I already work - a generalist holding many planes of a project at once.
Briefing - how product teams reach me
I picked up a habit on earlier projects: keeping my own brief artefacts, plus ones built with past research teams. They translate business goals and user needs into clean research briefs. The inputs were already to hand when I started building reusable skills here.

These skills point to one bigger thing: a small advisor for product teams at the idea-validation stage. Working title 'new idea discovery with user research helper'. Not ready yet.

What the advisor will do →
I gave Claude:
Method choice (piloting)
research-method-advisor: explains research methods and suggests which one fits the validation problem. Routes elsewhere if research isn't the right tool - to A/B testing, for example.
Qualitative
  • usability-task-description: scenarios → six-section brief (idea, business metrics, audience, goals, scenarios, success metrics).
  • usability-hypotheses: brief and prototype screens → 15-20 hypotheses grouped by screen. Includes a Nielsen heuristics pass first - hypotheses are stronger when the screens have already been read through evaluation principles. A separate ux-heuristics-review skill is in build for pre-dev design reviews.
  • usability-guide: moderator's guide and response matrix template.
saves time · reduces cognitive load
Quantitative (in build)
Same logic at the start as qual. Diverges later: no response matrix, since structure comes from the platform export.
  • quant-task-description, hypotheses, questions.
There's also a quant-analysis skill in build - covered in the Quant block below.

Open pilot decisions
Fieldwork - transcripts and the matrix
Our developers built a custom transcription tool on Lovable, tuned to our terminology. For Azerbaijani - a language under-served by global transcription services - it's the best option I've tested. Cuts transcription time and gives automatic translation to Russian and English.
saves time

From the transcript:
Video stays on my side. The user actions there are behaviour, not speech: where the participant tried, where they got stuck, the path they took. These go in the matrix as client action alongside researcher observation from the quotes. Auto-transcription doesn't shorten this part.
I run all of this in Miro. Fastest environment for my synthesis style, handles imported tables reasonably well. The bot-to-Miro connection isn't clean yet, so some manual carry-over.
still piloting
Analysis and synthesis
Qualitative
Video stays manual. What's on screen is behaviour: a participant trying things, getting stuck, finding a path through. These go into the matrix as client action alongside researcher observation from the quotes.
Analysis. My strength here is reading the same data through several lenses at once:
  • the hypotheses we went in with
  • the strategy of the specific flow or feature
  • the wider business direction
  • which recommendation lands where, for which audience and at what altitude.

This is the spider-web part of the job. AI handles part of the clustering at the front - distributing quotes into the matrix by guide question and by hypothesis.
I worked with my researcher to push further into the synthesis itself: drafting conclusions, building slides. The progress is uneven, and no single qual-analysis skill has landed yet.

What I tried with AI on the qual side →
Quantitative
Cleaner partnership. With well-structured input, AI does the bulk of the analysis. I add strategy framing and recommendations — including how the findings cut across products.
saves time on analysis

Take the Navigation prototype test — 6 explicit phases, each with a check and a handoff. Claude carried the mechanical work (data checks, formula scaffolding, data confirmation) while I kept my time for conclusions and recommendations.
structured handoffs, time for judgement

How the structure worked, and what's next →
There's a quant-analysis skill in build: it picks up at the data export from the test platform, produces the analysis doc or table from a template, and runs a mini-template for the main conclusions I work through with Claude. Recommendations stay with me.

I'm testing whether one skill can stretch across different quant methodologies. Piloting on simple quant UTests and UX surveys.
The bottleneck for now sits on the export side. Getting clean data out of our usability platform and Figma is messy.
still piloting export path
Multi-channel feedback tagging
I'm piloting this one on ChatGPT.

The input has two languages, Azerbaijani and Russian, with a wrinkle on top: a slice of the Azerbaijani arrives in informal transliteration - non-standard Latin spellings the model has to recognise as Azerbaijani before it can translate them at all.

Why I tagged the first 300 myself. That manual pass produced what the model couldn't have built from scratch: the working tag taxonomy, and a feel for the edge cases that come up later.
The pipeline runs in five stages today: prepare input → translate meaning → apply taxonomy → multi-issue rows → QA and output.

How the teaching actually worked →
One thing doesn't automate, even when the pipeline is fully built out. I write the recommendation with a UX-first lens. I call out the strategic or tactical decisions worth escalating, or the adjacent research that would settle the question.
reduces load at scale · still piloting
Reports
Reports is where I tried hardest to push AI into the full assembly, and where that route didn't pay off.

Template creation works with errors.
  • Figma goes with broken frames and unwanted pages.
  • Miro doesn't hold coordinates.
  • PPTX - slides constrain the layout, but the editor is hostile.
Making the full report doesn't work in any tool.
There's too much text, too many entity variations.

I make sprint-review previews by hand to make sure the selected findings land with the audience.

Claude does work with shortening the full report for specific audience like for C-Level decks, but without changing the materials too much. Also the Figma translator is most useful.

The full-template route asks for both at once - content shaping and assembly - and there a researcher with a template and an hour is better than the toolchain.
Guardrails
This work runs on a small set of standing rules. Mine map onto the NIST AI RMF principles.

No personal client data goes into a model; sensitive material is anonymised before it reaches the chat. Nothing runs without me in the loop.

The model never acts on anything I haven't approved. Permissions stay narrow by default: tools work inside scoped folders rather than against the file system. Anything new gets tested on a small batch before it scales.
Stretching the web
Three months in, with rounds of love and frustration on both sides. The AHA moments cut both ways. Overall time saved is smaller than the public discourse suggests.
The scale I'm reaching for isn't "shorter task here, shorter task there." It's the same mind, reaching further:
→ more product changes, and more influence, in the same working time.

The next architectural piece on my side is the knowledge base - the structured core the skills and agent read against. Without it, I'm still doing all the connecting work in my head.
The mosaic doesn't come together on my side alone. Teams that move faster build different habits around shared work - iterating together and treating artefacts as team property, not personal files. None of which is news.

How I think about my own role inside that picture: a generalist with a spider-web mind. I hold many planes at once - strategy and method, tactics and audience - and the connecting across them is what I bring. AI is how I scale that. The repeatable work goes off my plate; judgement gets the room it needs.
More cases