Transcriptions - Transcribe

Synchronous transcription of an audio file.

POST {endpoint}/speechtotext/transcriptions:transcribe?api-version=2024-11-15

URI Parameters

Name In Required Type Description
audio
formData True

file

binary

The content of the audio file to be transcribed. The audio file must be shorter than 2 hours in audio duration and smaller than 250 MB in size.

definition
formData

string

Metadata for a transcription request. This field contains a JSON-serialized object of type TranscribeDefinition.

endpoint
path True

string

Supported Cognitive Services endpoints (protocol and hostname, for example: https://westus.api.cognitive.microsoft.com).

api-version
query True

string

The requested api version.

Request Header

Media Types: "multipart/form-data"

Name Required Type Description
Ocp-Apim-Subscription-Key True

string

Provide your cognitive services account key here.

Responses

Name Type Description
200 OK

TranscribeResult

OK

Other Status Codes

Error

An error occurred.

Security

Ocp-Apim-Subscription-Key

Provide your cognitive services account key here.

Type: apiKey
In: header

Examples

Transcribe an audio file

Sample request

POST {endpoint}/speechtotext/transcriptions:transcribe?api-version=2024-11-15

Sample response

{
  "durationMilliseconds": 2000,
  "combinedPhrases": [
    {
      "text": "Weather"
    }
  ],
  "phrases": [
    {
      "offsetMilliseconds": 40,
      "durationMilliseconds": 320,
      "text": "Weather",
      "words": [
        {
          "text": "weather",
          "offsetMilliseconds": 40,
          "durationMilliseconds": 320
        }
      ],
      "locale": "en-US",
      "confidence": 0.78983736
    }
  ]
}

Definitions

Name Description
ChannelCombinedPhrases

The full transcript per channel.

DetailedErrorCode

DetailedErrorCode

Error

Error

ErrorCode

ErrorCode

InnerError

InnerError

Phrase

A transcribed phrase.

TranscribeResult

The result of the transcribe operation.

Word

Time-stamped word in the display form.

ChannelCombinedPhrases

The full transcript per channel.

Name Type Description
channel

integer

The 0-based channel index. Only present if channel separation is enabled.

text

string

The transcribed text.

DetailedErrorCode

DetailedErrorCode

Value Description
AudioLengthLimitExceeded

The audio file is longer than the maximum allowed duration.

BadChannelConfiguration

There is a mismatch between audio channels in the data, in the configuration, or the requirements of the application.

DataImportFailed

Data import failed.

DeleteNotAllowed

Delete not allowed.

DeployNotAllowed

Deploy not allowed.

DeployingFailedModel

Deploying failed model.

EmptyAudioFile

The audio file is empty.

EmptyRequest

Empty Request.

EndpointCannotBeDefault

Endpoint cannot be default.

EndpointLoggingNotSupported

Endpoint logging not supported.

EndpointNotUpdatable

Endpoint not updatable.

EndpointWithoutLogging

Endpoint without logging.

ExceededNumberOfRecordingsUris

Exceeded number of recordings uris.

FailedDataset

Failed dataset.

Forbidden

Forbidden.

InUseViolation

In use violation.

InaccessibleCustomerStorage

Inaccessible customer storage.

InvalidAdaptationMapping

Invalid adaptation mapping.

InvalidAudioFormat

The format of input audio is not supported.

InvalidBaseModel

Invalid base model.

InvalidCallbackUri

Invalid callback uri.

InvalidChannelSpecification

The selection of channels in the transcription request is not supported (e.g., neither 0 nor 1 have been selected.)

InvalidChannels

Invalid channels.

InvalidCollection

Invalid collection.

InvalidDataset

Invalid dataset.

InvalidDocument

Invalid Document.

InvalidDocumentBatch

Invalid Document Batch.

InvalidLocale

Invalid locale.

InvalidLogDate

Invalid log date.

InvalidLogEndTime

Invalid log end time.

InvalidLogId

Invalid log id.

InvalidLogStartTime

Invalid log start time.

InvalidModel

Invalid model.

InvalidModelUri

Invalid model uri.

InvalidParameter

Invalid parameter.

InvalidParameterValue

Invalid parameter value.

InvalidPayload

Invalid payload.

InvalidPermissions

Invalid permissions.

InvalidPrerequisite

Invalid prerequisite.

InvalidProductId

Invalid product id.

InvalidProject

Invalid project.

InvalidProjectKind

Invalid project kind.

InvalidRecordingsUri

Invalid recordings uri.

InvalidRequestBodyFormat

Invalid request body format.

InvalidSasValidityDuration

Invalid sas validity duration.

InvalidSkipTokenForLogs

Invalid skip token for logs.

InvalidSourceAzureResourceId

Invalid source Azure resource ID.

InvalidSubscription

Invalid subscription.

InvalidTest

Invalid test.

InvalidTimeToLive

Invalid time to live.

InvalidTopForLogs

Invalid top for logs.

InvalidTranscription

Invalid transcription.

InvalidWebHookEventKind

Invalid web hook event kind.

MissingInputRecords

Missing Input Records.

ModelCopyAuthorizationExpired

Expired ModelCopyAuthorization.

ModelDeploymentNotCompleteState

Model deployment not complete state.

ModelDeprecated

Model deprecated.

ModelExists

Model exists.

ModelMismatch

Model mismatch.

ModelNotDeployable

Model not deployable.

ModelVersionIncorrect

Model Version Incorrect.

MultipleLanguagesIdentified

Language Identification recognized multiple languages. No dominant language could be determined.

NoLanguageIdentified

Language Identification did not recognize any language.

NoUtf8WithBom

No utf8 with bom.

OnlyOneOfUrlsOrContainerOrDataset

Only one of urls or container or dataset.

ProjectGenderMismatch

Project gender mismatch.

QuotaViolation

Quota violation.

SingleDefaultEndpoint

Single default endpoint.

SkuLimitsExist

Sku limits exist.

SubscriptionNotFound

Subscription not found.

UnexpectedError

Unexpected error.

UnsupportedClassBasedAdaptation

Unsupported class based adaptation.

UnsupportedDelta

Unsupported delta.

UnsupportedDynamicConfiguration

Unsupported dynamic configuration.

UnsupportedFilter

Unsupported filter.

UnsupportedLanguageCode

Unsupported language code.

UnsupportedOrderBy

Unsupported order by.

UnsupportedPagination

Unsupported pagination.

UnsupportedTimeRange

Unsupported time range.

Error

Error

Name Type Description
code

ErrorCode

ErrorCode
High level error codes.

details

Error[]

Additional supportive details regarding the error and/or expected policies.

innerError

InnerError

InnerError
New Inner Error format which conforms to Cognitive Services API Guidelines which is available at https://microsoft.sharepoint.com/%3Aw%3A/t/CognitiveServicesPMO/EUoytcrjuJdKpeOKIK_QRC8BPtUYQpKBi8JsWyeDMRsWlQ?e=CPq8ow. This contains required properties ErrorCode, message and optional properties target, details(key value pair), inner error(this can be nested).

message

string

High level error message.

target

string

The source of the error. For example it would be "documents" or "document id" in case of invalid document.

ErrorCode

ErrorCode

Value Description
Conflict

Representing the conflict error code.

Forbidden

Representing the forbidden error code.

InternalCommunicationFailed

Representing the internal communication failed error code.

InternalServerError

Representing the internal server error error code.

InvalidArgument

Representing the invalid argument error code.

InvalidRequest

Representing the invalid request error code.

NotAllowed

Representing the not allowed error code.

NotFound

Representing the not found error code.

PipelineError

Representing the pipeline error error code.

ServiceUnavailable

Representing the service unavailable error code.

TooManyRequests

Representing the too many requests error code.

Unauthorized

Representing the unauthorized error code.

UnprocessableEntity

Representing the unprocessable entity error code.

UnsupportedMediaType

Representing the unsupported media type error code.

InnerError

InnerError

Name Type Description
code

DetailedErrorCode

DetailedErrorCode
Detailed error code enum.

details

object

Additional supportive details regarding the error and/or expected policies.

innerError

InnerError

InnerError
New Inner Error format which conforms to Cognitive Services API Guidelines which is available at https://microsoft.sharepoint.com/%3Aw%3A/t/CognitiveServicesPMO/EUoytcrjuJdKpeOKIK_QRC8BPtUYQpKBi8JsWyeDMRsWlQ?e=CPq8ow. This contains required properties ErrorCode, message and optional properties target, details(key value pair), inner error(this can be nested).

message

string

High level error message.

target

string

The source of the error. For example it would be "documents" or "document id" in case of invalid document.

Phrase

A transcribed phrase.

Name Type Description
channel

integer

The 0-based channel index. Only present if channel separation is enabled.

confidence

number

The confidence value for the phrase.

durationMilliseconds

integer

The duration of the phrase in milliseconds.

locale

string

The locale of the phrase.

offsetMilliseconds

integer

The start offset of the phrase in milliseconds.

speaker

integer

A unique integer number that is assigned to each speaker detected in the audio without particular order. Only present if speaker diarization is enabled.

text

string

The transcribed text of the phrase.

words

Word[]

The words that make up the phrase. Only present if word-level timestamps are enabled.

TranscribeResult

The result of the transcribe operation.

Name Type Description
combinedPhrases

ChannelCombinedPhrases[]

The full transcript for each channel.

durationMilliseconds

integer

The duration of the audio in milliseconds.

phrases

Phrase[]

The transcription results segmented into phrases.

Word

Time-stamped word in the display form.

Name Type Description
durationMilliseconds

integer

The duration of the word in milliseconds.

offsetMilliseconds

integer

The start offset of the word in milliseconds.

text

string

The recognized word, including punctuation.