Datasets - Create

Uploads and creates a new dataset by getting the data from a specified URL or starts waiting for data blocks to be uploaded.

POST {endpoint}/speechtotext/datasets?api-version=2024-11-15

URI Parameters

Name In Required Type Description
endpoint
path True

string

Supported Cognitive Services endpoints (protocol and hostname, for example: https://westus.api.cognitive.microsoft.com).

api-version
query True

string

The requested api version.

Request Header

Name Required Type Description
Ocp-Apim-Subscription-Key True

string

Provide your cognitive services account key here.

Request Body

Name Required Type Description
displayName True

string

The display name of the object.

kind True

DatasetKind

DatasetKind
Type of data import.

locale True

string

The locale of the contained data.

contentUrl

string

The URL of the data for the dataset.

customProperties

object

The custom properties of this entity. The maximum allowed key length is 64 characters, the maximum allowed value length is 256 characters and the count of allowed entries is 10.

description

string

The description of the object.

properties

DatasetProperties

DatasetProperties

Responses

Name Type Description
201 Created

Dataset

The response contains information about the entity as payload and its location as header.

Headers

Location: string

Other Status Codes

Error

An error occurred.

Security

Ocp-Apim-Subscription-Key

Provide your cognitive services account key here.

Type: apiKey
In: header

Examples

Create a dataset with content url
Create dataset from data blocks

Create a dataset with content url

Sample request

POST {endpoint}/speechtotext/datasets?api-version=2024-11-15


{
  "displayName": "My speech dataset name",
  "description": "My speech dataset description",
  "locale": "en-US",
  "kind": "Acoustic",
  "contentUrl": "https://contoso.com/location"
}

Sample response

Location: https://westus.api.cognitive.microsoft.com/speechtotext/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1?api-version=2024-11-15
{
  "self": "https://westus.api.cognitive.microsoft.com/speechtotext/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1?api-version=2024-11-15",
  "displayName": "Acoustic dataset",
  "locale": "en-US",
  "createdDateTime": "2019-01-07T11:34:12Z",
  "lastActionDateTime": "2019-01-07T11:36:07Z",
  "kind": "Acoustic",
  "links": {
    "files": "https://westus.api.cognitive.microsoft.com/speechtotext/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1/files?api-version=2024-11-15"
  },
  "properties": {
    "acceptedLineCount": 11,
    "rejectedLineCount": 2,
    "durationMilliseconds": 252000,
    "textNormalizationKind": "Default"
  },
  "contentUrl": "https://www.contoso.com/acousticdata/sourcelocation",
  "status": "Succeeded"
}

Create dataset from data blocks

Sample request

POST {endpoint}/speechtotext/datasets?api-version=2024-11-15


{
  "displayName": "My speech dataset name",
  "description": "My speech dataset description",
  "locale": "en-US",
  "kind": "Acoustic"
}

Sample response

{
  "self": "https://westus.api.cognitive.microsoft.com/speechtotext/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1?api-version=2024-11-15",
  "displayName": "Acoustic dataset",
  "locale": "en-US",
  "createdDateTime": "2019-01-07T11:34:12Z",
  "lastActionDateTime": "2019-01-07T11:36:07Z",
  "kind": "Acoustic",
  "links": {
    "files": "https://westus.api.cognitive.microsoft.com/speechtotext/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1?api-version=2024-11-15/files?api-version=2024-11-15",
    "commitBlocks": "https://westus.api.cognitive.microsoft.com/speechtotext/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1?api-version=2024-11-15/blocks:commit?api-version=2024-11-15",
    "listBlocks": "https://westus.api.cognitive.microsoft.com/speechtotext/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1?api-version=2024-11-15/blocks?api-version=2024-11-15",
    "uploadBlocks": "https://westus.api.cognitive.microsoft.com/speechtotext/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1?api-version=2024-11-15/blocks?api-version=2024-11-15"
  },
  "status": "NotStarted"
}

Definitions

Name Description
Dataset

Dataset

DatasetKind

DatasetKind

DatasetLinks

DatasetLinks

DatasetProperties

DatasetProperties

DetailedErrorCode

DetailedErrorCode

EntityError

EntityError

Error

Error

ErrorCode

ErrorCode

InnerError

InnerError

Status

Status

TextNormalizationKind

TextNormalizationKind

Dataset

Dataset

Name Type Description
contentUrl

string

The URL of the data for the dataset.

createdDateTime

string

The time-stamp when the object was created. The time stamp is encoded as ISO 8601 date and time format ("YYYY-MM-DDThh:mm:ssZ", see https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations).

customProperties

object

The custom properties of this entity. The maximum allowed key length is 64 characters, the maximum allowed value length is 256 characters and the count of allowed entries is 10.

description

string

The description of the object.

displayName

string

The display name of the object.

kind

DatasetKind

DatasetKind
Type of data import.

lastActionDateTime

string

The time-stamp when the current status was entered. The time stamp is encoded as ISO 8601 date and time format ("YYYY-MM-DDThh:mm:ssZ", see https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations).

links

DatasetLinks

DatasetLinks

locale

string

The locale of the contained data.

properties

DatasetProperties

DatasetProperties

self

string

The location of this entity.

status

Status

Status
Describe the current state of the API.

DatasetKind

DatasetKind

Name Type Description
Acoustic

string

An acoustic dataset.

AudioFiles

string

An audio files dataset.

Language

string

A language dataset.

LanguageMarkdown

string

A language markdown dataset.

OutputFormatting

string

Dataset that contains rules to customize inverse text normalization, capitalization, reformulation, profanity and also defines tests for dataset validation

Pronunciation

string

A pronunciation dataset.

DatasetLinks

Name Type Description
commitBlocks

string

The location to commit the list of blocks when uploading a dataset using blocks. See operation "Datasets_CommitBlocks" for more details.

files

string

The location to get all files of this entity. See operation "Datasets_ListFiles" for more details.

listBlocks

string

The location to list the already uploaded blocks of this entity when uploading a dataset using blocks. See operation "Datasets_GetBlocks" for more details.

uploadBlocks

string

The location to upload blocks to when uploading a dataset using blocks. See operation "Datasets_UploadBlock" for more details.

DatasetProperties

DatasetProperties

Name Type Default value Description
acceptedLineCount

integer

The number of lines accepted for this data set.

durationMilliseconds

integer

0

The total duration in milliseconds of the datasets if it contains audio files. Durations larger than 2^53-1 are not supported to ensure compatibility with JavaScript integers.

error

EntityError

EntityError

rejectedLineCount

integer

The number of lines rejected for this data set.

textNormalizationKind

TextNormalizationKind

TextNormalizationKind
The kind of text normalization.

DetailedErrorCode

DetailedErrorCode

Name Type Description
AudioLengthLimitExceeded

string

The audio file is longer than the maximum allowed duration.

BadChannelConfiguration

string

There is a mismatch between audio channels in the data, in the configuration, or the requirements of the application.

DataImportFailed

string

Data import failed.

DeleteNotAllowed

string

Delete not allowed.

DeployNotAllowed

string

Deploy not allowed.

DeployingFailedModel

string

Deploying failed model.

EmptyAudioFile

string

The audio file is empty.

EmptyRequest

string

Empty Request.

EndpointCannotBeDefault

string

Endpoint cannot be default.

EndpointLoggingNotSupported

string

Endpoint logging not supported.

EndpointNotUpdatable

string

Endpoint not updatable.

EndpointWithoutLogging

string

Endpoint without logging.

ExceededNumberOfRecordingsUris

string

Exceeded number of recordings uris.

FailedDataset

string

Failed dataset.

Forbidden

string

Forbidden.

InUseViolation

string

In use violation.

InaccessibleCustomerStorage

string

Inaccessible customer storage.

InvalidAdaptationMapping

string

Invalid adaptation mapping.

InvalidAudioFormat

string

The format of input audio is not supported.

InvalidBaseModel

string

Invalid base model.

InvalidCallbackUri

string

Invalid callback uri.

InvalidChannelSpecification

string

The selection of channels in the transcription request is not supported (e.g., neither 0 nor 1 have been selected.)

InvalidChannels

string

Invalid channels.

InvalidCollection

string

Invalid collection.

InvalidDataset

string

Invalid dataset.

InvalidDocument

string

Invalid Document.

InvalidDocumentBatch

string

Invalid Document Batch.

InvalidLocale

string

Invalid locale.

InvalidLogDate

string

Invalid log date.

InvalidLogEndTime

string

Invalid log end time.

InvalidLogId

string

Invalid log id.

InvalidLogStartTime

string

Invalid log start time.

InvalidModel

string

Invalid model.

InvalidModelUri

string

Invalid model uri.

InvalidParameter

string

Invalid parameter.

InvalidParameterValue

string

Invalid parameter value.

InvalidPayload

string

Invalid payload.

InvalidPermissions

string

Invalid permissions.

InvalidPrerequisite

string

Invalid prerequisite.

InvalidProductId

string

Invalid product id.

InvalidProject

string

Invalid project.

InvalidProjectKind

string

Invalid project kind.

InvalidRecordingsUri

string

Invalid recordings uri.

InvalidRequestBodyFormat

string

Invalid request body format.

InvalidSasValidityDuration

string

Invalid sas validity duration.

InvalidSkipTokenForLogs

string

Invalid skip token for logs.

InvalidSourceAzureResourceId

string

Invalid source Azure resource ID.

InvalidSubscription

string

Invalid subscription.

InvalidTest

string

Invalid test.

InvalidTimeToLive

string

Invalid time to live.

InvalidTopForLogs

string

Invalid top for logs.

InvalidTranscription

string

Invalid transcription.

InvalidWebHookEventKind

string

Invalid web hook event kind.

MissingInputRecords

string

Missing Input Records.

ModelCopyAuthorizationExpired

string

Expired ModelCopyAuthorization.

ModelDeploymentNotCompleteState

string

Model deployment not complete state.

ModelDeprecated

string

Model deprecated.

ModelExists

string

Model exists.

ModelMismatch

string

Model mismatch.

ModelNotDeployable

string

Model not deployable.

ModelVersionIncorrect

string

Model Version Incorrect.

MultipleLanguagesIdentified

string

Language Identification recognized multiple languages. No dominant language could be determined.

NoLanguageIdentified

string

Language Identification did not recognize any language.

NoUtf8WithBom

string

No utf8 with bom.

OnlyOneOfUrlsOrContainerOrDataset

string

Only one of urls or container or dataset.

ProjectGenderMismatch

string

Project gender mismatch.

QuotaViolation

string

Quota violation.

SingleDefaultEndpoint

string

Single default endpoint.

SkuLimitsExist

string

Sku limits exist.

SubscriptionNotFound

string

Subscription not found.

UnexpectedError

string

Unexpected error.

UnsupportedClassBasedAdaptation

string

Unsupported class based adaptation.

UnsupportedDelta

string

Unsupported delta.

UnsupportedDynamicConfiguration

string

Unsupported dynamic configuration.

UnsupportedFilter

string

Unsupported filter.

UnsupportedLanguageCode

string

Unsupported language code.

UnsupportedOrderBy

string

Unsupported order by.

UnsupportedPagination

string

Unsupported pagination.

UnsupportedTimeRange

string

Unsupported time range.

EntityError

EntityError

Name Type Description
code

string

The code of this error.

message

string

The message for this error.

Error

Error

Name Type Description
code

ErrorCode

ErrorCode
High level error codes.

details

Error[]

Additional supportive details regarding the error and/or expected policies.

innerError

InnerError

InnerError
New Inner Error format which conforms to Cognitive Services API Guidelines which is available at https://microsoft.sharepoint.com/%3Aw%3A/t/CognitiveServicesPMO/EUoytcrjuJdKpeOKIK_QRC8BPtUYQpKBi8JsWyeDMRsWlQ?e=CPq8ow. This contains required properties ErrorCode, message and optional properties target, details(key value pair), inner error(this can be nested).

message

string

High level error message.

target

string

The source of the error. For example it would be "documents" or "document id" in case of invalid document.

ErrorCode

ErrorCode

Name Type Description
Conflict

string

Representing the conflict error code.

Forbidden

string

Representing the forbidden error code.

InternalCommunicationFailed

string

Representing the internal communication failed error code.

InternalServerError

string

Representing the internal server error error code.

InvalidArgument

string

Representing the invalid argument error code.

InvalidRequest

string

Representing the invalid request error code.

NotAllowed

string

Representing the not allowed error code.

NotFound

string

Representing the not found error code.

PipelineError

string

Representing the pipeline error error code.

ServiceUnavailable

string

Representing the service unavailable error code.

TooManyRequests

string

Representing the too many requests error code.

Unauthorized

string

Representing the unauthorized error code.

UnprocessableEntity

string

Representing the unprocessable entity error code.

UnsupportedMediaType

string

Representing the unsupported media type error code.

InnerError

InnerError

Name Type Description
code

DetailedErrorCode

DetailedErrorCode
Detailed error code enum.

details

object

Additional supportive details regarding the error and/or expected policies.

innerError

InnerError

InnerError
New Inner Error format which conforms to Cognitive Services API Guidelines which is available at https://microsoft.sharepoint.com/%3Aw%3A/t/CognitiveServicesPMO/EUoytcrjuJdKpeOKIK_QRC8BPtUYQpKBi8JsWyeDMRsWlQ?e=CPq8ow. This contains required properties ErrorCode, message and optional properties target, details(key value pair), inner error(this can be nested).

message

string

High level error message.

target

string

The source of the error. For example it would be "documents" or "document id" in case of invalid document.

Status

Status

Name Type Description
Failed

string

The long running operation has failed.

NotStarted

string

The long running operation has not yet started.

Running

string

The long running operation is currently processing.

Succeeded

string

The long running operation has successfully completed.

TextNormalizationKind

TextNormalizationKind

Name Type Description
Default

string

Default text normalization (e.g. '2 to 3' is replaced by 'two to three' in en-US).

None

string

No text normalization will be applied to the input text. This is an override option that should only be used when text is normalized before the upload.