現場コンパス
use-case

How to Record and Transcribe Interviews for Research

Qualitative researchers, journalists, and UX professionals pay $1–3 per minute for transcription—or spend hours doing it manually. AI changes the math. Here's how to do it right.

MinuteKeep Team
#transcribe interviews#qualitative research transcription#AI interview transcription#UX research documentation#journalism transcription#research workflow

You've just wrapped a 90-minute interview. Your subject was thoughtful, specific, and said exactly the kind of thing your research needed. You have an audio file on your phone and a page of shorthand that made perfect sense in the moment and now reads like a foreign language.

And now comes the part every researcher dreads: getting that conversation into a usable form.

If you've transcribed interviews the traditional way—typing while listening, rewinding every 20 seconds, spending four hours on a 90-minute recording—you know the math. Qualitative research generates enormous amounts of audio. Every hour of interview content translates to roughly four to six hours of transcription time, done manually. For a study with twelve participants, that's between 50 and 70 hours of transcription before analysis can even begin.

Professional transcription services solve the time problem, but they introduce a cost problem: the going rate for human transcription runs $1 to $3 per minute for standard turnaround, rising to $4 or more for verbatim transcription with speaker labels and timestamps. A single 90-minute interview can cost $90 to $270. A twelve-person qualitative study can run $2,000 in transcription fees before any other research cost is considered.

AI transcription changes the math on both sides of that equation—but only if you understand what it does well and where you still need human judgment. This guide covers the full workflow, the ethical considerations you need to get right first, and what accuracy expectations are realistic for research contexts.


Automate your meeting notes. MinuteKeep records your meeting and uses AI to transcribe, summarize, and extract action items. 9 languages, no subscription, 30 min free.

Traditional vs. AI Transcription for Research

Before committing to a transcription approach, it helps to understand what the two options actually deliver.

Professional Human Transcription

The industry benchmark for research-grade transcription has historically been human transcription services—companies like Rev, Scribie, and TranscribeMe that employ trained transcriptionists to produce verbatim text from audio.

What you get: High accuracy (95–99% for clean audio), speaker identification, verbatim capture including false starts and filler words, timestamps on request, and a human who can flag unintelligible passages rather than guessing.

What it costs: $1.00–$1.50/minute at standard accuracy from services like Rev; $2.00–$3.00/minute for verbatim with speaker labels; four-to-24-hour turnaround depending on service tier.

What it requires: Sharing your audio files with a third party, which introduces confidentiality considerations depending on your participant consent agreements and IRB protocols.

AI Transcription

AI transcription—using models like OpenAI's Whisper that power most modern apps—has closed the accuracy gap considerably. For research audio recorded under controlled conditions with a single speaker and minimal background noise, accuracy in the 90–95% range is common. For group interviews, heavily accented speech, or ambient noise, it drops.

What you get: Near-immediate turnaround (minutes rather than hours), significantly lower cost, no third-party data sharing when the app processes locally or through privacy-respecting APIs, and a starting transcript that requires editing rather than creation.

What it costs: A fraction of human rates. With MinuteKeep, 90 minutes of transcription costs roughly $0.75 at the 2-hour/$0.99 pricing tier.

What it requires: A realistic understanding of what AI gets wrong—and a workflow for catching and correcting those errors before analysis.

The practical case for AI in research: For most qualitative research applications, AI transcription provides a solid first draft that requires 30–60 minutes of correction versus 4–6 hours of original transcription. Even factoring in review time, the net time savings are substantial. The cost savings are unambiguous.

The caveat: for high-stakes research where a single misheard word could meaningfully change interpretation—legal testimony, medical interviews, research that will be used in litigation—human transcription or thorough human review remains the appropriate standard.


Ethics and Consent: What to Get Right Before You Record

This is the part of the workflow that happens before you open any app. For research recordings, ethical requirements are not optional.

Informed Consent

Every participant in a recorded research interview must give informed consent before the recording begins. This means they understand:

  • That the interview is being recorded
  • How the recording will be stored and who will have access to it
  • How the transcript will be used (analysis, quotation in publications, etc.)
  • Whether any third-party transcription services or AI tools will process the audio
  • How and when the recording will be destroyed or anonymized

If your IRB (Institutional Review Board) or ethics committee has approved your study, your consent process has already been reviewed. Follow it exactly. If you're doing journalism or UX research outside a formal IRB process, the ethical obligations are the same even if the formal oversight structure is different.

IRB Protocols and Third-Party Data Sharing

This is where AI transcription introduces a consideration many researchers don't anticipate.

If your IRB approval specifies that audio data will not be shared with third parties, using a cloud-based AI transcription service may violate your protocol. The audio is sent to a remote server for processing—that constitutes sharing.

For research conducted under IRB oversight, review your consent forms and IRB approval language carefully before choosing a transcription method. Options include:

  • Local processing tools that run on-device rather than sending audio to a server
  • Apps with explicit data-handling policies that align with your IRB's requirements
  • Seeking an IRB amendment if you want to add AI transcription to an already-approved protocol, which many IRBs will consider given the research utility

MinuteKeep sends audio to OpenAI's API for processing via Supabase Edge Functions. This is not local processing. If your research protocol prohibits third-party data sharing, you need to either secure IRB clearance for this approach or use a different tool.

Participant Anonymity

Consent to record does not automatically mean consent to be identified. For research transcripts, standard practice is to anonymize participant identifiers before sharing transcripts with anyone not on the research team—replacing real names with pseudonyms or participant codes (P1, P2, etc.) in the transcript before it's used in analysis or publication.


Step-by-Step Interview Transcription Workflow

Once consent is secured and your recording setup is confirmed, here's a workflow that handles the full arc from interview to usable research document.

Before the Interview

1. Add research terminology to your custom dictionary.

AI transcription models are trained on general speech and systematically mishandle technical vocabulary, participant names, and field-specific terminology. If your research involves jargon—a medical study using clinical terms, a tech UX interview using product names, a sociological study with specialized theoretical concepts—those terms will be misrendered unless you prepare for them.

MinuteKeep's custom dictionary lets you define substitution pairs: what Whisper produces (the wrong version) and what you want to see (the correct version). Add your key terms before the interview, not after.

For a study on, say, healthcare system navigation:

  • "continuity of care" might come out as "continuity of cure"
  • An EMR system called "Epic" gets through fine, but a niche system called "Meditech" may not
  • Participant pseudonyms you plan to use can be pre-loaded

See Custom Dictionary for AI Transcription for detailed setup instructions and examples by field.

2. Set your recording environment.

AI accuracy degrades with ambient noise, multiple overlapping voices, and low recording volume. Practical steps that consistently improve output quality:

  • Record in a quiet room with the door closed
  • Position the recording device 12–18 inches from the speaker
  • Use a phone rather than a laptop when possible (laptop fans introduce constant low-frequency noise)
  • For remote interviews, record through the meeting app's audio rather than ambient room pickup

3. Confirm your format settings.

For research transcription, the full transcript is what you need—not a summary. MinuteKeep generates both, and the transcript tab is where the verbatim content lives.

During the Interview

Record from the start. Don't wait to press record until after small talk ends—those informal moments sometimes contain relevant context.

If a participant says something that you know will be hard to parse—they use a term you didn't anticipate, they speak over themselves, there's a technical interruption—make a brief handwritten note of the timestamp. You'll know where to listen carefully during review.

After the Interview

Step 1: Generate the transcript.

Processing time depends on interview length. A 90-minute interview typically generates a transcript within a few minutes of completing.

Step 2: Do a review pass against the audio.

This step is non-negotiable for research use. AI transcription is accurate enough to be a first draft, not accurate enough to be a final research document without human review.

Your review pass should check:

  • Any passage flagged during the interview as potentially difficult
  • Technical terms, proper nouns, and jargon throughout
  • Direct quotes you intend to use in analysis or publication (these require verbatim accuracy)
  • Any sections where the transcript reads as grammatically incomplete or contextually strange—these usually signal a misheard word

For a 90-minute interview, a careful review pass takes 30–45 minutes, compared to the 4–6 hours of manual transcription it replaces.

Step 3: Anonymize the transcript.

Replace real participant names with your study codes (P1, P2, etc.) before the transcript is used in analysis. Do this in your text editor after exporting, not in the app itself.

Step 4: Export and file.

Copy the transcript from MinuteKeep and paste into your research document management system. Store the transcript alongside your original audio until data destruction protocols apply.


Try MinuteKeep for Your Next Interview

30 minutes free on install. No account, no subscription—pay per use when you need more time.

2 hours / $0.99 — enough for two or three full research interviews at a cost most research budgets can absorb without a second thought.

Download MinuteKeep on the App Store


Accuracy Considerations for Research Contexts

Accuracy is the thing most researchers focus on when evaluating AI transcription. But accuracy in research contexts is more nuanced than the headline percentage suggests.

What the Numbers Mean

Published accuracy benchmarks for Whisper (the model MinuteKeep uses) typically report Word Error Rate (WER)—the percentage of words that are wrong compared to a human-generated transcript. A WER of 5–10% means 5–10 words per 100 are incorrect.

For a 90-minute interview with roughly 12,000–15,000 words, a 5% WER means 600–750 errors. Most of those errors are minor—wrong article, misheard preposition, slight word variation that doesn't change meaning. But some will be substantively wrong.

Where AI Transcription Errors Concentrate

Errors in research transcripts are not randomly distributed. They cluster in predictable places:

Technical terminology: Field-specific terms, theoretical concepts, and jargon that don't appear frequently in general speech. This is exactly what the custom dictionary addresses.

Names: Participant names, researcher names, place names, institution names. Add any recurring names to your dictionary.

Heavily accented speech: Whisper's accuracy varies significantly across accents. Standard American and British English are best-supported. Non-native speakers with strong L1 influence may generate more errors.

Multiple speakers: MinuteKeep does not include automated speaker diarization (the ability to identify and label who is speaking). For one-on-one interviews, this isn't an issue—the transcript reads as a single stream of text that you can manually label. For focus groups or multi-person interviews, this is a limitation.

Crosstalk and interruptions: When speakers overlap, AI models produce unreliable output for the overlapping segment.

Verbatim vs. Cleaned Transcription

Traditional transcription services often offer two tiers: verbatim (includes all false starts, filler words, "um," "uh," repetitions) and clean (edited for readability). AI transcription tends to produce something between these—it captures most speech but may smooth over some fillers.

For research that analyzes discourse patterns, conversational dynamics, or communication style (conversation analysis, discourse analysis, linguistic research), verbatim capture matters and the AI's tendency to smooth output is a limitation. For research that analyzes content and themes, the cleaned output is usually appropriate.

Know which you need before choosing your verification standard.


Use Cases by Research Type

Qualitative Social Science Research

Qualitative interviews for sociological, anthropological, or psychological research are the core use case. Participants discuss experiences, perceptions, and behavior in open-ended conversations. The goal is thematic analysis—identifying patterns across what participants say.

AI transcription produces a usable first draft quickly. The review pass ensures accuracy before coding. The savings in transcription time translate directly to more time for analysis.

After transcribing multiple interviews, MinuteKeep's AI Chat lets you ask questions across all saved notes. For a thematic analysis stage, this means querying: "What did participants say about barriers to accessing healthcare?" and getting a synthesized answer drawn from your actual transcripts. This is a research acceleration tool, not a substitute for rigorous coding—but it helps identify themes and locate relevant passages faster than manual search.

For more on how the AI Chat feature retrieves information across saved notes, see How AI Meeting Transcription Works.

Journalism

Journalists conducting source interviews face the same transcription math as researchers, with the added pressure of publication deadlines. The difference is that journalistic standards require specific, attributable quotes—which means verbatim accuracy on direct quotation matters more than overall transcript accuracy.

A useful workflow: use AI transcription for the initial pass, then verify every quote you intend to use by listening to the audio at those specific timestamps. The transcript gets you to the quote quickly; the audio confirms it's right.

Journalistic interviews also raise source confidentiality questions that parallel IRB concerns. If your source requested confidentiality, sending their audio to a third-party API may not align with your obligations to them. Evaluate this against your specific situation.

UX Research

User experience researchers conduct interviews as part of usability testing, user research, and discovery research. A typical UX research project involves 5–15 participant sessions of 30–60 minutes each.

For UX research, the speed advantage of AI transcription is particularly valuable: research synthesis happens on compressed timelines, and having transcripts available same-day rather than the next morning changes what's possible in a sprint.

UX research also tends to involve product-specific vocabulary—feature names, screen names, user workflows, internal terminology—that AI models won't know. The custom dictionary is especially useful here. Load your product's terminology before the first session, and it applies across all subsequent recordings.

Academic Thesis and Dissertation Research

Graduate students conducting qualitative research for theses or dissertations often have no transcription budget at all. Traditional transcription services are a cost that comes out of pocket. For a dissertation with 15 interviews averaging 60 minutes, professional transcription at $1.50/minute runs $1,350—a meaningful expense for most graduate students.

AI transcription at MinuteKeep's pricing: roughly $4–5 total for the same 15 hours of content. That math changes what's feasible.

The tradeoff is review time, which still belongs to the researcher. But for most thesis research, 30–45 minutes of review per interview is significantly preferable to 4–6 hours of transcription.

For a fuller picture of how AI transcription fits into academic workflows, see How Students Use AI Transcription for Lectures and Study.


Best Practices for Research Transcription

Record consent first. Start the recording before the formal interview begins and capture your verbal confirmation of consent as the opening of the audio file. This protects both you and the participant.

Pre-populate your dictionary. Invest 15 minutes before the first interview of a study to add key terms. The time spent building the dictionary before session one saves time across every subsequent session.

Timestamp your field notes. If you're taking notes during the interview, note timestamps for significant moments. The review pass goes faster when you know which sections of the audio deserve closer attention.

Export and back up promptly. MinuteKeep stores notes locally on your device. Export your transcript shortly after generating it—copy to a research document, email to yourself, save to a secure research repository. Don't treat the app as your primary archive.

Use AI Chat for cross-interview synthesis, not single-interview analysis. Once you've transcribed multiple interviews, AI Chat is useful for orientation and pattern identification across the corpus. For deep analysis of individual transcripts, work directly with the text.

Track your error patterns. As you review transcripts, note what kinds of errors recur. Add them to your dictionary immediately—each addition improves every future transcript in the study.


Frequently Asked Questions

Does using AI transcription require amending my IRB protocol?

This depends on your protocol language. If your consent forms and IRB submission state that audio data will not be shared with third parties, using a cloud-processing AI transcription app may require an amendment—because the audio is sent to an external API. If your consent forms allow for third-party processing with appropriate data handling agreements, you may be able to proceed. Review your specific protocol language with your IRB coordinator. Many IRBs will consider amendments for this purpose given the research utility and the data handling practices of established providers.

How accurate is AI transcription for interviews with non-native English speakers?

Accuracy drops for non-native speakers, and the degree depends on the speaker's accent and fluency level. Whisper-based models perform best on native or near-native English, standard American or British accent. For speakers with strong L1 influence, expect more errors and budget additional review time. MinuteKeep also supports transcription in 9 languages natively—if your interview is conducted in the participant's first language rather than English, transcription accuracy may actually improve.

Can MinuteKeep identify who is speaking in a multi-person interview?

Not automatically. MinuteKeep does not include speaker diarization (automated speaker identification). For one-on-one interviews, this doesn't affect usability—the transcript reads as a sequence of utterances you can label manually. For focus groups or multi-participant sessions, you'll need to manually mark speaker turns during your review pass.

What should I do if the audio quality is poor—heavy background noise, quiet recording, etc.?

Poor audio is the primary driver of AI transcription errors. If you have a recording with significant quality issues, expect more errors and plan for a longer review pass. The High Accuracy mode in MinuteKeep (which uses a higher-tier transcription model) can improve results for difficult audio at 2x the time cost. If the audio is severely degraded—unintelligible in places—AI transcription will produce errors that are hard to catch without returning to the audio frequently, and professional human transcription may be more efficient.

How do I handle parts of the transcript that are marked as unintelligible?

AI transcription models don't mark unintelligible segments the way a human transcriptionist would—they guess rather than flagging uncertainty. This means you won't see "[inaudible]" markers; instead, you'll see a word or phrase that may or may not be correct. The practical implication: review your transcripts actively rather than treating the text as reliable by default. If something reads oddly, go back to the audio. For research purposes, it's better to mark a segment as "[unclear]" in your final transcript than to carry a confident transcription error into your analysis.


Key Takeaways

  • Professional human transcription costs $1–3 per minute and involves third-party access to your audio. AI transcription costs a fraction of that and can be faster, but requires a human review pass before research use.
  • Informed consent for recording must explicitly address how transcription will be handled, including whether AI or third-party services will process the audio.
  • IRB protocols may have specific language about third-party data sharing that affects which transcription tools you can legally use under your approved protocol.
  • AI transcription errors cluster in predictable places: technical terminology, names, accented speech, and overlapping speech. A custom dictionary handles the first two proactively.
  • For a 90-minute interview, expect 30–45 minutes of review time versus 4–6 hours of manual transcription—a substantial net saving even accounting for quality assurance.
  • MinuteKeep's AI Chat feature lets you search across all transcribed interviews, which is useful for cross-study synthesis and early thematic orientation.
  • At 2 hours for $0.99, AI transcription is within reach for research budgets at any level—including graduate students with no transcription budget.

For more on how AI transcription handles technical vocabulary in specialized fields, see Custom Dictionary for AI Transcription. To understand what's happening technically when AI converts speech to text, How AI Meeting Transcription Works walks through the full pipeline. And for a broader look at how AI transcription fits into student and academic research workflows, see How Students Use AI Transcription for Lectures and Study.


Try MinuteKeep Free

30 minutes of free recording. No subscription required.

Download on the App Store