L’API Transcription rapide permet de transcrire des fichiers audio avec retour des résultats de manière synchronisé et plus rapide qu’en temps réel. Utilisez la transcription rapide dans les scénarios où vous avez besoin de la transcription d’un enregistrement audio le plus rapidement possible avec une latence prévisible, par exemple :
Contrairement à l’API de transcription par lots, l’API de transcription rapide produit uniquement des transcriptions sous la forme d’affichage (pas lexical). Le formulaire d’affichage est une forme plus lisible par l’homme de la transcription qui inclut la ponctuation et la mise en majuscule.
Effectuez une demande d’autotest de mise sous tension (POST) multipart/form-data sur le point de terminaison transcriptions
avec le fichier audio et les propriétés du corps de la demande.
L’exemple suivant montre comment transcrire un fichier audio avec des paramètres régionaux spécifiés. Si vous connaissez les paramètres régionaux du fichier audio, vous pouvez le spécifier pour améliorer la précision de la transcription et réduire la latence.
- Remplacez
par votre clé de ressource Speech.
- Remplacez
par la région de votre ressource Speech.
- Remplacez
par le chemin d’accès à votre fichier audio.
curl --location 'https://YourServiceRegion.api.cognitive.microsoft.com/speechtotext/transcriptions:transcribe?api-version=2024-11-15' \
--header 'Content-Type: multipart/form-data' \
--header 'Ocp-Apim-Subscription-Key: YourSubscriptionKey' \
--form 'audio=@"YourAudioFile"' \
--form 'definition="{
Construisez la définition de formulaire en suivant les instructions suivantes :
- Définissez la propriété facultative (mais recommandée)
qui doit correspondre aux paramètres régionaux attendus des données audio à transcrire. Dans cet exemple, les paramètres régionaux sont définis en-US
sur . Les paramètres régionaux pris en charge sont : de-DE, en-GB, en-IN, en-US, es-ES, es-MX, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, pt-BR et zh-CN.
Pour plus d’informations sur locales
et d’autres propriétés de l’API de transcription rapide, consultez la section options de configuration de la requête plus loin dans ce guide.
La réponse comprend un tableau durationMilliseconds
, offsetMilliseconds
et bien plus encore. La propriété combinedPhrases
contient les transcriptions complètes pour tous les haut-parleurs.
"durationMilliseconds": 182439,
"combinedPhrases": [
"text": "Good afternoon. This is Sam. Thank you for calling Contoso. How can I help? Hi there. My name is Mary. I'm currently living in Los Angeles, but I'm planning to move to Las Vegas. I would like to apply for a loan. Okay. I see you're currently living in California. Let me make sure I understand you correctly. Uh You'd like to apply for a loan even though you'll be moving soon. Is that right? Yes, exactly. So I'm planning to relocate soon, but I would like to apply for the loan first so that I can purchase a new home once I move there. And are you planning to sell your current home? Yes, I will be listing it on the market soon and hopefully it'll sell quickly. That's why I'm applying for a loan now, so that I can purchase a new house in Nevada and close on it quickly as well once my current home sells. I see. Would you mind holding for a moment while I take your information down? Yeah, no problem. Thank you for your help. Mm-hmm. Just one moment. All right. Thank you for your patience, ma'am. May I have your first and last name, please? Yes, my name is Mary Smith. Thank you, Ms. Smith. May I have your current address, please? Yes. So my address is 123 Main Street in Los Angeles, California, and the zip code is 90923. Sorry, that was a 90 what? 90923. 90923 on Main Street. Got it. Thank you. May I have your phone number as well, please? Uh Yes, my phone number is 504-529-2351 and then yeah. 2351. Got it. And do you have an e-mail address we I can associate with this application? uh Yes, so my e-mail address is mary.a.sm78@gmail.com. Mary.a, was that a S-N as in November or M as in Mike? M as in Mike. Mike78, got it. Thank you. Ms. Smith, do you currently have any other loans? Uh Yes, so I currently have two other loans through Contoso. So my first one is my car loan and then my other is my student loan. They total about 1400 per month combined and my interest rate is 8%. I see. And you're currently paying those loans off monthly, is that right? Yes, of course I do. OK, thank you. Here's what I suggest we do. Let me place you on a brief hold again so that I can talk with one of our loan officers and get this started for you immediately. In the meantime, it would be great if you could take a few minutes and complete the remainder of the secure application online at www.contosoloans.com. Yeah, that sounds good. I can go ahead and get started. Thank you for your help. Thank you."
"phrases": [
"offsetMilliseconds": 960,
"durationMilliseconds": 640,
"text": "Good afternoon.",
"words": [
"text": "Good",
"offsetMilliseconds": 960,
"durationMilliseconds": 240
"text": "afternoon.",
"offsetMilliseconds": 1200,
"durationMilliseconds": 400
"locale": "en-US",
"confidence": 0.93616915
"offsetMilliseconds": 1600,
"durationMilliseconds": 640,
"text": "This is Sam.",
"words": [
"text": "This",
"offsetMilliseconds": 1600,
"durationMilliseconds": 240
"text": "is",
"offsetMilliseconds": 1840,
"durationMilliseconds": 120
"text": "Sam.",
"offsetMilliseconds": 1960,
"durationMilliseconds": 280
"locale": "en-US",
"confidence": 0.93616915
"offsetMilliseconds": 2240,
"durationMilliseconds": 1040,
"text": "Thank you for calling Contoso.",
"words": [
"text": "Thank",
"offsetMilliseconds": 2240,
"durationMilliseconds": 200
"text": "you",
"offsetMilliseconds": 2440,
"durationMilliseconds": 80
"text": "for",
"offsetMilliseconds": 2520,
"durationMilliseconds": 120
"text": "calling",
"offsetMilliseconds": 2640,
"durationMilliseconds": 200
"text": "Contoso.",
"offsetMilliseconds": 2840,
"durationMilliseconds": 440
"locale": "en-US",
"confidence": 0.93616915
"offsetMilliseconds": 3280,
"durationMilliseconds": 640,
"text": "How can I help?",
"words": [
"text": "How",
"offsetMilliseconds": 3280,
"durationMilliseconds": 120
"text": "can",
"offsetMilliseconds": 3440,
"durationMilliseconds": 120
"text": "I",
"offsetMilliseconds": 3560,
"durationMilliseconds": 40
"text": "help?",
"offsetMilliseconds": 3600,
"durationMilliseconds": 320
"locale": "en-US",
"confidence": 0.93616915
"offsetMilliseconds": 5040,
"durationMilliseconds": 400,
"text": "Hi there.",
"words": [
"text": "Hi",
"offsetMilliseconds": 5040,
"durationMilliseconds": 240
"text": "there.",
"offsetMilliseconds": 5280,
"durationMilliseconds": 160
"locale": "en-US",
"confidence": 0.93616915
"offsetMilliseconds": 5440,
"durationMilliseconds": 800,
"text": "My name is Mary.",
"words": [
"text": "My",
"offsetMilliseconds": 5440,
"durationMilliseconds": 80
"text": "name",
"offsetMilliseconds": 5520,
"durationMilliseconds": 120
"text": "is",
"offsetMilliseconds": 5640,
"durationMilliseconds": 80
"text": "Mary.",
"offsetMilliseconds": 5720,
"durationMilliseconds": 520
"locale": "en-US",
"confidence": 0.93616915
// More transcription results...
// Redacted for brevity
"offsetMilliseconds": 180320,
"durationMilliseconds": 680,
"text": "Thank you for your help.",
"words": [
"text": "Thank",
"offsetMilliseconds": 180320,
"durationMilliseconds": 160
"text": "you",
"offsetMilliseconds": 180480,
"durationMilliseconds": 80
"text": "for",
"offsetMilliseconds": 180560,
"durationMilliseconds": 120
"text": "your",
"offsetMilliseconds": 180680,
"durationMilliseconds": 120
"text": "help.",
"offsetMilliseconds": 180800,
"durationMilliseconds": 200
"locale": "en-US",
"confidence": 0.9314801
"offsetMilliseconds": 181960,
"durationMilliseconds": 280,
"text": "Thank you.",
"words": [
"text": "Thank",
"offsetMilliseconds": 181960,
"durationMilliseconds": 200
"text": "you.",
"offsetMilliseconds": 182160,
"durationMilliseconds": 80
"locale": "en-US",
"confidence": 0.9314801
L’exemple suivant montre comment transcrire un fichier audio avec l’identification de langue activée. Si vous n’êtes pas sûr des paramètres régionaux, vous pouvez spécifier plusieurs paramètres régionaux. Si vous ne spécifiez aucun paramètre régional ou si les paramètres régionaux que vous spécifiez ne figurent pas dans le fichier audio, le service Speech tente d’identifier les paramètres régionaux.
- Remplacez
par votre clé de ressource Speech.
- Remplacez
par la région de votre ressource Speech.
- Remplacez
par le chemin d’accès à votre fichier audio.
curl --location 'https://YourServiceRegion.api.cognitive.microsoft.com/speechtotext/transcriptions:transcribe?api-version=2024-11-15' \
--header 'Content-Type: multipart/form-data' \
--header 'Ocp-Apim-Subscription-Key: YourSubscriptionKey' \
--form 'audio=@"YourAudioFile"' \
--form 'definition="{
Construisez la définition de formulaire en suivant les instructions suivantes :
- Définissez la propriété facultative (mais recommandée)
qui doit correspondre aux paramètres régionaux attendus des données audio à transcrire. Dans cet exemple, les paramètres régionaux sont définis sur en-US
et ja-JP
. Les paramètres régionaux pris en charge sont : de-DE, en-GB, en-IN, en-US, es-ES, es-MX, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, pt-BR et zh-CN.
Pour plus d’informations sur locales
et d’autres propriétés de l’API de transcription rapide, consultez la section options de configuration de la requête plus loin dans ce guide.
La réponse comprend un tableau durationMilliseconds
, offsetMilliseconds
et bien plus encore. La propriété combinedPhrases
contient les transcriptions complètes pour tous les haut-parleurs.
"durationMilliseconds": 185079,
"combinedPhrases": [
"text": "Hello, thank you for calling Contoso. Who am I speaking with today? Hi, my name is Mary Rondo. I'm trying to enroll myself with Contoso. Hi, Mary. Are you calling because you need health insurance? Yes. Yeah, I'm calling to sign up for insurance. Great. Uh If you can answer a few questions, we can get you signed up in a Jiffy. Okay. So what's your full name? uh So Mary Beth Rondo, last name is R like Romeo, O like Ocean, N like Nancy D, D like Dog, and O like Ocean again. Rondo. Got it. And what's the best callback number in case we get disconnected? I only have a cell phone, so I can give you that. Yep, that'll be fine. Sure. So it's 234-554 and then 9312. Got it. So to confirm, it's 234-554-9312. Yep, that's right. Excellent. Let's get some additional information for your application. Do you have a job? Uh Yes, I am self-employed. Okay, so then you have a social security number as well? Uh Yes, I do. Okay, and what is your social security number, please? Uh Sure, so it's 412-253-4931. 6789. Sorry, was that a 25 or a 225? You cut out for a bit. It's double two, so 412, then another two, then five. Thank you so much. And could I have your e-mail address, please? Yeah, it's maryrondo@gmail.com. So my first and last name at gmail.com. No periods, no dashes. Great. Uh That is the last question. So let me take your information and I'll be able to get you signed up right away. Thank you for calling Contoso and I'll be able to get you signed up immediately. One of our agents will call you back in about 24 hours or so to confirm your application. That sounds good. Thank you. Absolutely. If you need anything else, please give us a call at 1-800-555-5564, extension 123. Thank you very much for calling Contoso. Actually, so I have one more question. Yes, of course. I'm curious, will I be getting a physical card as proof of coverage? So the default is a digital membership card, but we can send you a physical card if you prefer. Uh Yes. Could you please mail it to me when it's ready? I'd like to have it shipped to, are you ready for my address? Uh Yeah. uh So it's 2660 Unit A on Maple Avenue, Southeast Lansing, and then zip code is 48823. Absolutely. I've made a note on your file. Awesome. Thanks so much. You're very welcome. Thank you for calling Contoso and have a great day."
"phrases": [
"offsetMilliseconds": 720,
"durationMilliseconds": 1600,
"text": "Hello, thank you for calling Contoso.",
"words": [
"text": "Hello,",
"offsetMilliseconds": 720,
"durationMilliseconds": 480
"text": "thank",
"offsetMilliseconds": 1200,
"durationMilliseconds": 200
"text": "you",
"offsetMilliseconds": 1400,
"durationMilliseconds": 80
"text": "for",
"offsetMilliseconds": 1480,
"durationMilliseconds": 120
"text": "calling",
"offsetMilliseconds": 1600,
"durationMilliseconds": 240
"text": "Contoso.",
"offsetMilliseconds": 1840,
"durationMilliseconds": 480
"locale": "en-US",
"confidence": 0.93265927
"offsetMilliseconds": 2320,
"durationMilliseconds": 1120,
"text": "Who am I speaking with today?",
"words": [
"text": "Who",
"offsetMilliseconds": 2320,
"durationMilliseconds": 160
"text": "am",
"offsetMilliseconds": 2480,
"durationMilliseconds": 80
"text": "I",
"offsetMilliseconds": 2560,
"durationMilliseconds": 80
"text": "speaking",
"offsetMilliseconds": 2640,
"durationMilliseconds": 320
"text": "with",
"offsetMilliseconds": 2960,
"durationMilliseconds": 160
"text": "today?",
"offsetMilliseconds": 3120,
"durationMilliseconds": 320
"locale": "en-US",
"confidence": 0.93265927
"offsetMilliseconds": 4480,
"durationMilliseconds": 1600,
"text": "Hi, my name is Mary Rondo.",
"words": [
"text": "Hi,",
"offsetMilliseconds": 4480,
"durationMilliseconds": 400
"text": "my",
"offsetMilliseconds": 4880,
"durationMilliseconds": 120
"text": "name",
"offsetMilliseconds": 5000,
"durationMilliseconds": 120
"text": "is",
"offsetMilliseconds": 5120,
"durationMilliseconds": 160
"text": "Mary",
"offsetMilliseconds": 5280,
"durationMilliseconds": 240
"text": "Rondo.",
"offsetMilliseconds": 5520,
"durationMilliseconds": 560
"locale": "en-US",
"confidence": 0.93265927
"offsetMilliseconds": 6120,
"durationMilliseconds": 1800,
"text": "I'm trying to enroll myself with Contoso.",
"words": [
"text": "I'm",
"offsetMilliseconds": 6120,
"durationMilliseconds": 120
"text": "trying",
"offsetMilliseconds": 6240,
"durationMilliseconds": 200
"text": "to",
"offsetMilliseconds": 6440,
"durationMilliseconds": 80
"text": "enroll",
"offsetMilliseconds": 6520,
"durationMilliseconds": 200
"text": "myself",
"offsetMilliseconds": 6720,
"durationMilliseconds": 360
"text": "with",
"offsetMilliseconds": 7080,
"durationMilliseconds": 120
"text": "Contoso.",
"offsetMilliseconds": 7200,
"durationMilliseconds": 720
"locale": "en-US",
"confidence": 0.93265927
// More transcription results...
// Redacted for brevity
"offsetMilliseconds": 181520,
"durationMilliseconds": 720,
"text": "You're very welcome.",
"words": [
"text": "You're",
"offsetMilliseconds": 181520,
"durationMilliseconds": 160
"text": "very",
"offsetMilliseconds": 181680,
"durationMilliseconds": 200
"text": "welcome.",
"offsetMilliseconds": 181880,
"durationMilliseconds": 360
"locale": "en-US",
"confidence": 0.90571773
"offsetMilliseconds": 182320,
"durationMilliseconds": 1840,
"text": "Thank you for calling Contoso and have a great day.",
"words": [
"text": "Thank",
"offsetMilliseconds": 182320,
"durationMilliseconds": 200
"text": "you",
"offsetMilliseconds": 182520,
"durationMilliseconds": 80
"text": "for",
"offsetMilliseconds": 182600,
"durationMilliseconds": 120
"text": "calling",
"offsetMilliseconds": 182720,
"durationMilliseconds": 280
"text": "Contoso",
"offsetMilliseconds": 183000,
"durationMilliseconds": 520
"text": "and",
"offsetMilliseconds": 183520,
"durationMilliseconds": 160
"text": "have",
"offsetMilliseconds": 183680,
"durationMilliseconds": 120
"text": "a",
"offsetMilliseconds": 183800,
"durationMilliseconds": 40
"text": "great",
"offsetMilliseconds": 183840,
"durationMilliseconds": 200
"text": "day.",
"offsetMilliseconds": 184040,
"durationMilliseconds": 120
"locale": "en-US",
"confidence": 0.90571773
L’exemple suivant montre comment transcrire un fichier audio avec la diarisation activée. La diarisation fait la distinction entre différents orateurs dans la conversation. Le service Speech fournit des informations sur l’orateur qui parlait une partie particulière de la parole transcrite.
- Remplacez
par votre clé de ressource Speech.
- Remplacez
par la région de votre ressource Speech.
- Remplacez
par le chemin d’accès à votre fichier audio.
curl --location 'https://YourServiceRegion.api.cognitive.microsoft.com/speechtotext/transcriptions:transcribe?api-version=2024-11-15' \
--header 'Content-Type: multipart/form-data' \
--header 'Ocp-Apim-Subscription-Key: YourSubscriptionKey' \
--form 'audio=@"YourAudioFile"' \
--form 'definition="{
"diarization": {"maxSpeakers": 2,"enabled": true}}"'
Construisez la définition de formulaire en suivant les instructions suivantes :
Définissez la propriété facultative (mais recommandée) locales
qui doit correspondre aux paramètres régionaux attendus des données audio à transcrire. Dans cet exemple, les paramètres régionaux sont définis en-US
sur . Les paramètres régionaux pris en charge sont : de-DE, en-GB, en-IN, en-US, es-ES, es-MX, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, pt-BR et zh-CN.
Définissez la propriété diarization
pour reconnaître et séparer plusieurs haut-parleurs dans un canal audio. Par exemple, spécifiez "diarization": {"maxSpeakers": 2, "enabled": true}
. Le fichier de transcription contient des entrées speaker
pour chaque phrase transcrite.
Pour plus d’informations sur locales
, diarization
et d’autres propriétés pour l’API de transcription rapide, consultez la section options de configuration de la requête plus loin dans ce guide.
La réponse comprend un tableau durationMilliseconds
, offsetMilliseconds
et bien plus encore. Dans cet exemple, la diarisation est activée. La réponse inclut donc des informations speaker
pour chaque expression transcrite. La propriété combinedPhrases
contient les transcriptions complètes pour tous les haut-parleurs d’un seul canal.
"durationMilliseconds": 182439,
"combinedPhrases": [
"channel": 0,
"text": "Good afternoon. This is Sam. Thank you for calling Contoso. How can I help? Hi there. My name is Mary. I'm currently living in Los Angeles, but I'm planning to move to Las Vegas. I would like to apply for a loan. Okay. I see you're currently living in California. Let me make sure I understand you correctly. Uh You'd like to apply for a loan even though you'll be moving soon. Is that right? Yes, exactly. So I'm planning to relocate soon, but I would like to apply for the loan first so that I can purchase a new home once I move there. And are you planning to sell your current home? Yes, I will be listing it on the market soon and hopefully it'll sell quickly. That's why I'm applying for a loan now, so that I can purchase a new house in Nevada and close on it quickly as well once my current home sells. I see. Would you mind holding for a moment while I take your information down? Yeah, no problem. Thank you for your help. Mm-hmm. Just one moment. All right. Thank you for your patience, ma'am. May I have your first and last name, please? Yes, my name is Mary Smith. Thank you, Ms. Smith. May I have your current address, please? Yes. So my address is 123 Main Street in Los Angeles, California, and the zip code is 90923. Sorry, that was a 90 what? 90923. 90923 on Main Street. Got it. Thank you. May I have your phone number as well, please? Uh. Yes, my phone number is 504-529-2351 and then yeah. 2351. Got it. And do you have an e-mail address we I can associate with this application? Uh Yes, so my e-mail address is mary.a.sm78@gmail.com. Mary.a, was that a S-N as in November or M as in Mike? M as in Mike. Mike78, got it. Thank you. Ms. Smith, do you currently have any other loans? Uh Yes, so I currently have two other loans through Contoso. So my first one is my car loan and then my other is my student loan. They total about 1400 per month combined and my interest rate is 8%. I see. And. You're currently paying those loans off monthly, is that right? Yes, of course I do. OK, thank you. Here's what I suggest we do. Let me place you on a brief hold again so that I can talk with one of our loan officers and get this started for you immediately. In the meantime, it would be great if you could take a few minutes and complete the remainder of the secure application online at www.contosoloans.com. Yeah, that sounds good. I can go ahead and get started. Thank you for your help. Thank you."
"phrases": [
"channel": 0,
"speaker": 1,
"offsetMilliseconds": 960,
"durationMilliseconds": 640,
"text": "Good afternoon.",
"words": [
"text": "Good",
"offsetMilliseconds": 960,
"durationMilliseconds": 240
"text": "afternoon.",
"offsetMilliseconds": 1200,
"durationMilliseconds": 400
"locale": "en-US",
"confidence": 0.93616915
"channel": 0,
"speaker": 1,
"offsetMilliseconds": 1600,
"durationMilliseconds": 640,
"text": "This is Sam.",
"words": [
"text": "This",
"offsetMilliseconds": 1600,
"durationMilliseconds": 240
"text": "is",
"offsetMilliseconds": 1840,
"durationMilliseconds": 120
"text": "Sam.",
"offsetMilliseconds": 1960,
"durationMilliseconds": 280
"locale": "en-US",
"confidence": 0.93616915
"channel": 0,
"speaker": 1,
"offsetMilliseconds": 2240,
"durationMilliseconds": 1040,
"text": "Thank you for calling Contoso.",
"words": [
"text": "Thank",
"offsetMilliseconds": 2240,
"durationMilliseconds": 200
"text": "you",
"offsetMilliseconds": 2440,
"durationMilliseconds": 80
"text": "for",
"offsetMilliseconds": 2520,
"durationMilliseconds": 120
"text": "calling",
"offsetMilliseconds": 2640,
"durationMilliseconds": 200
"text": "Contoso.",
"offsetMilliseconds": 2840,
"durationMilliseconds": 440
"locale": "en-US",
"confidence": 0.93616915
"channel": 0,
"speaker": 1,
"offsetMilliseconds": 3280,
"durationMilliseconds": 640,
"text": "How can I help?",
"words": [
"text": "How",
"offsetMilliseconds": 3280,
"durationMilliseconds": 120
"text": "can",
"offsetMilliseconds": 3440,
"durationMilliseconds": 120
"text": "I",
"offsetMilliseconds": 3560,
"durationMilliseconds": 40
"text": "help?",
"offsetMilliseconds": 3600,
"durationMilliseconds": 320
"locale": "en-US",
"confidence": 0.93616915
"channel": 0,
"speaker": 0,
"offsetMilliseconds": 5040,
"durationMilliseconds": 400,
"text": "Hi there.",
"words": [
"text": "Hi",
"offsetMilliseconds": 5040,
"durationMilliseconds": 240
"text": "there.",
"offsetMilliseconds": 5280,
"durationMilliseconds": 160
"locale": "en-US",
"confidence": 0.93616915
"channel": 0,
"speaker": 0,
"offsetMilliseconds": 5440,
"durationMilliseconds": 800,
"text": "My name is Mary.",
"words": [
"text": "My",
"offsetMilliseconds": 5440,
"durationMilliseconds": 80
"text": "name",
"offsetMilliseconds": 5520,
"durationMilliseconds": 120
"text": "is",
"offsetMilliseconds": 5640,
"durationMilliseconds": 80
"text": "Mary.",
"offsetMilliseconds": 5720,
"durationMilliseconds": 520
"locale": "en-US",
"confidence": 0.93616915
// More transcription results...
// Redacted for brevity
"channel": 0,
"speaker": 0,
"offsetMilliseconds": 180320,
"durationMilliseconds": 680,
"text": "Thank you for your help.",
"words": [
"text": "Thank",
"offsetMilliseconds": 180320,
"durationMilliseconds": 160
"text": "you",
"offsetMilliseconds": 180480,
"durationMilliseconds": 80
"text": "for",
"offsetMilliseconds": 180560,
"durationMilliseconds": 120
"text": "your",
"offsetMilliseconds": 180680,
"durationMilliseconds": 120
"text": "help.",
"offsetMilliseconds": 180800,
"durationMilliseconds": 200
"locale": "en-US",
"confidence": 0.9314801
"channel": 0,
"speaker": 1,
"offsetMilliseconds": 181960,
"durationMilliseconds": 280,
"text": "Thank you.",
"words": [
"text": "Thank",
"offsetMilliseconds": 181960,
"durationMilliseconds": 200
"text": "you.",
"offsetMilliseconds": 182160,
"durationMilliseconds": 80
"locale": "en-US",
"confidence": 0.9314801
L’exemple suivant montre comment transcrire un fichier audio qui a un ou deux canaux. Les transcriptions multicanaux sont utiles pour les fichiers audio avec plusieurs canaux, tels que des fichiers audio avec plusieurs haut-parleurs ou des fichiers audio avec un bruit d’arrière-plan. Par défaut, l’API de transcription rapide fusionne tous les canaux d’entrée en un seul canal, puis effectue la transcription. Si ce n’est pas souhaitable, les canaux peuvent être transcrits indépendamment sans fusion.
- Remplacez
par votre clé de ressource Speech.
- Remplacez
par la région de votre ressource Speech.
- Remplacez
par le chemin d’accès à votre fichier audio.
curl --location 'https://YourServiceRegion.api.cognitive.microsoft.com/speechtotext/transcriptions:transcribe?api-version=2024-11-15' \
--header 'Content-Type: multipart/form-data' \
--header 'Ocp-Apim-Subscription-Key: YourSubscriptionKey' \
--form 'audio=@"YourAudioFile"' \
--form 'definition="{
"channels": [0,1]}"'
Construisez la définition de formulaire en suivant les instructions suivantes :
Définissez la propriété facultative (mais recommandée) locales
qui doit correspondre aux paramètres régionaux attendus des données audio à transcrire. Dans cet exemple, les paramètres régionaux sont définis en-US
sur . Les paramètres régionaux pris en charge sont : de-DE, en-GB, en-IN, en-US, es-ES, es-MX, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, pt-BR et zh-CN.
Définissez la propriété channels
en spécifiant les index de base zéro des canaux à transcrire séparément. Jusqu’à deux canaux sont pris en charge, sauf si la diarisation est activée. Dans cet exemple, les canaux 0 et 1 sont spécifiés.
Pour plus d’informations sur locales
, channels
et d’autres propriétés pour l’API de transcription rapide, consultez la section options de configuration de la requête plus loin dans ce guide.
La réponse comprend un tableau durationMilliseconds
, offsetMilliseconds
et bien plus encore. La propriété channel
identifie le canal si le fichier audio contient plusieurs canaux. La propriété combinedPhrases
contient des transcriptions complètes distinctes par canal audio. Recherchez "channel": 0,"text"
et "channel": 1,"text"
pour identifier les transcriptions complètes pour chaque canal.
"durationMilliseconds": 185079,
"combinedPhrases": [
"channel": 0,
"text": "Hello. Thank you for calling Contoso. Who am I speaking with today? Hi, Mary. Are you calling because you need health insurance? Great. If you can answer a few questions, we can get you signed up in the Jiffy. So what's your full name? Got it. And what's the best callback number in case we get disconnected? Yep, that'll be fine. Got it. So to confirm, it's 234-554-9312. Excellent. Let's get some additional information for your application. Do you have a job? OK, so then you have a Social Security number as well. OK, and what is your Social Security number please? Sorry, what was that, a 25 or a 225? You cut out for a bit. Alright, thank you so much. And could I have your e-mail address please? Great. Uh That is the last question. So let me take your information and I'll be able to get you signed up right away. Thank you for calling Contoso and I'll be able to get you signed up immediately. One of our agents will call you back in about 24 hours or so to confirm your application. Absolutely. If you need anything else, please give us a call at 1-800-555-5564, extension 123. Thank you very much for calling Contoso. Uh Yes, of course. So the default is a digital membership card, but we can send you a physical card if you prefer. Uh, yeah. Absolutely. I've made a note on your file. You're very welcome. Thank you for calling Contoso and have a great day."
"channel": 1,
"text": "Hi, my name is Mary Rondo. I'm trying to enroll myself with Contuso. Yes, yeah, I'm calling to sign up for insurance. Okay. So Mary Beth Rondo, last name is R like Romeo, O like Ocean, N like Nancy D, D like Dog, and O like Ocean again. Rondo. I only have a cell phone so I can give you that. Sure, so it's 234-554 and then 9312. Yep, that's right. Uh Yes, I am self-employed. Yes, I do. Uh Sure, so it's 412256789. It's double two, so 412, then another two, then five. Yeah, it's maryrondo@gmail.com. So my first and last name at gmail.com. No periods, no dashes. That was quick. Thank you. Actually, so I have one more question. I'm curious, will I be getting a physical card as proof of coverage? uh Yes. Could you please mail it to me when it's ready? I'd like to have it shipped to, are you ready for my address? So it's 2660 Unit A on Maple Avenue SE, Lansing, and then zip code is 48823. Awesome. Thanks so much."
"phrases": [
"channel": 0,
"offsetMilliseconds": 720,
"durationMilliseconds": 480,
"text": "Hello.",
"words": [
"text": "Hello.",
"offsetMilliseconds": 720,
"durationMilliseconds": 480
"locale": "en-US",
"confidence": 0.9177142
"channel": 0,
"offsetMilliseconds": 1200,
"durationMilliseconds": 1120,
"text": "Thank you for calling Contoso.",
"words": [
"text": "Thank",
"offsetMilliseconds": 1200,
"durationMilliseconds": 200
"text": "you",
"offsetMilliseconds": 1400,
"durationMilliseconds": 80
"text": "for",
"offsetMilliseconds": 1480,
"durationMilliseconds": 120
"text": "calling",
"offsetMilliseconds": 1600,
"durationMilliseconds": 240
"text": "Contoso.",
"offsetMilliseconds": 1840,
"durationMilliseconds": 480
"locale": "en-US",
"confidence": 0.9177142
"channel": 0,
"offsetMilliseconds": 2320,
"durationMilliseconds": 1120,
"text": "Who am I speaking with today?",
"words": [
"text": "Who",
"offsetMilliseconds": 2320,
"durationMilliseconds": 160
"text": "am",
"offsetMilliseconds": 2480,
"durationMilliseconds": 80
"text": "I",
"offsetMilliseconds": 2560,
"durationMilliseconds": 80
"text": "speaking",
"offsetMilliseconds": 2640,
"durationMilliseconds": 320
"text": "with",
"offsetMilliseconds": 2960,
"durationMilliseconds": 160
"text": "today?",
"offsetMilliseconds": 3120,
"durationMilliseconds": 320
"locale": "en-US",
"confidence": 0.9177142
"channel": 0,
"offsetMilliseconds": 9520,
"durationMilliseconds": 400,
"text": "Hi, Mary.",
"words": [
"text": "Hi,",
"offsetMilliseconds": 9520,
"durationMilliseconds": 80
"text": "Mary.",
"offsetMilliseconds": 9600,
"durationMilliseconds": 320
"locale": "en-US",
"confidence": 0.9177142
// More transcription results...
// Redacted for brevity
"channel": 1,
"offsetMilliseconds": 4480,
"durationMilliseconds": 1600,
"text": "Hi, my name is Mary Rondo.",
"words": [
"text": "Hi,",
"offsetMilliseconds": 4480,
"durationMilliseconds": 400
"text": "my",
"offsetMilliseconds": 4880,
"durationMilliseconds": 120
"text": "name",
"offsetMilliseconds": 5000,
"durationMilliseconds": 120
"text": "is",
"offsetMilliseconds": 5120,
"durationMilliseconds": 160
"text": "Mary",
"offsetMilliseconds": 5280,
"durationMilliseconds": 240
"text": "Rondo.",
"offsetMilliseconds": 5520,
"durationMilliseconds": 560
"locale": "en-US",
"confidence": 0.8989456
"channel": 1,
"offsetMilliseconds": 6080,
"durationMilliseconds": 1920,
"text": "I'm trying to enroll myself with Contuso.",
"words": [
"text": "I'm",
"offsetMilliseconds": 6080,
"durationMilliseconds": 160
"text": "trying",
"offsetMilliseconds": 6240,
"durationMilliseconds": 200
"text": "to",
"offsetMilliseconds": 6440,
"durationMilliseconds": 80
"text": "enroll",
"offsetMilliseconds": 6520,
"durationMilliseconds": 200
"text": "myself",
"offsetMilliseconds": 6720,
"durationMilliseconds": 360
"text": "with",
"offsetMilliseconds": 7080,
"durationMilliseconds": 120
"text": "Contuso.",
"offsetMilliseconds": 7200,
"durationMilliseconds": 800
"locale": "en-US",
"confidence": 0.8989456
// More transcription results...
// Redacted for brevity
Voici quelques options de propriété pour configurer une transcription lorsque vous appelez l’opération Transcriptions - Transcrire.