Datasets - Create
Uploads and creates a new dataset by getting the data from a specified URL or starts waiting for data blocks to be uploaded.
POST {endpoint}/speechtotext/datasets?api-version=2024-11-15
URI Parameters
Name | In | Required | Type | Description |
---|---|---|---|---|
endpoint
|
path | True |
string |
Supported Cognitive Services endpoints (protocol and hostname, for example: https://westus.api.cognitive.microsoft.com). |
api-version
|
query | True |
string |
The requested api version. |
Request Header
Name | Required | Type | Description |
---|---|---|---|
Ocp-Apim-Subscription-Key | True |
string |
Provide your cognitive services account key here. |
Request Body
Name | Required | Type | Description |
---|---|---|---|
displayName | True |
string |
The display name of the object. |
kind | True |
DatasetKind |
|
locale | True |
string |
The locale of the contained data. |
contentUrl |
string |
The URL of the data for the dataset. |
|
customProperties |
object |
The custom properties of this entity. The maximum allowed key length is 64 characters, the maximum allowed value length is 256 characters and the count of allowed entries is 10. |
|
description |
string |
The description of the object. |
|
properties |
DatasetProperties |
Responses
Name | Type | Description |
---|---|---|
201 Created |
The response contains information about the entity as payload and its location as header. Headers Location: string |
|
Other Status Codes |
An error occurred. |
Security
Ocp-Apim-Subscription-Key
Provide your cognitive services account key here.
Type:
apiKey
In:
header
Examples
Create a dataset with content url |
Create dataset from data blocks |
Create a dataset with content url
Sample request
POST {endpoint}/speechtotext/datasets?api-version=2024-11-15
{
"displayName": "My speech dataset name",
"description": "My speech dataset description",
"locale": "en-US",
"kind": "Acoustic",
"contentUrl": "https://contoso.com/location"
}
Sample response
Location: https://westus.api.cognitive.microsoft.com/speechtotext/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1?api-version=2024-11-15
{
"self": "https://westus.api.cognitive.microsoft.com/speechtotext/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1?api-version=2024-11-15",
"displayName": "Acoustic dataset",
"locale": "en-US",
"createdDateTime": "2019-01-07T11:34:12Z",
"lastActionDateTime": "2019-01-07T11:36:07Z",
"kind": "Acoustic",
"links": {
"files": "https://westus.api.cognitive.microsoft.com/speechtotext/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1/files?api-version=2024-11-15"
},
"properties": {
"acceptedLineCount": 11,
"rejectedLineCount": 2,
"durationMilliseconds": 252000,
"textNormalizationKind": "Default"
},
"contentUrl": "https://www.contoso.com/acousticdata/sourcelocation",
"status": "Succeeded"
}
Create dataset from data blocks
Sample request
POST {endpoint}/speechtotext/datasets?api-version=2024-11-15
{
"displayName": "My speech dataset name",
"description": "My speech dataset description",
"locale": "en-US",
"kind": "Acoustic"
}
Sample response
{
"self": "https://westus.api.cognitive.microsoft.com/speechtotext/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1?api-version=2024-11-15",
"displayName": "Acoustic dataset",
"locale": "en-US",
"createdDateTime": "2019-01-07T11:34:12Z",
"lastActionDateTime": "2019-01-07T11:36:07Z",
"kind": "Acoustic",
"links": {
"files": "https://westus.api.cognitive.microsoft.com/speechtotext/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1?api-version=2024-11-15/files?api-version=2024-11-15",
"commitBlocks": "https://westus.api.cognitive.microsoft.com/speechtotext/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1?api-version=2024-11-15/blocks:commit?api-version=2024-11-15",
"listBlocks": "https://westus.api.cognitive.microsoft.com/speechtotext/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1?api-version=2024-11-15/blocks?api-version=2024-11-15",
"uploadBlocks": "https://westus.api.cognitive.microsoft.com/speechtotext/datasets/9d5f4100-5f8e-4dd6-bd83-9bbbf50d57f1?api-version=2024-11-15/blocks?api-version=2024-11-15"
},
"status": "NotStarted"
}
Definitions
Name | Description |
---|---|
Dataset |
Dataset |
Dataset |
DatasetKind |
Dataset |
DatasetLinks |
Dataset |
DatasetProperties |
Detailed |
DetailedErrorCode |
Entity |
EntityError |
Error |
Error |
Error |
ErrorCode |
Inner |
InnerError |
Status |
Status |
Text |
TextNormalizationKind |
Dataset
Dataset
Name | Type | Description |
---|---|---|
contentUrl |
string |
The URL of the data for the dataset. |
createdDateTime |
string |
The time-stamp when the object was created. The time stamp is encoded as ISO 8601 date and time format ("YYYY-MM-DDThh:mm:ssZ", see https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations). |
customProperties |
object |
The custom properties of this entity. The maximum allowed key length is 64 characters, the maximum allowed value length is 256 characters and the count of allowed entries is 10. |
description |
string |
The description of the object. |
displayName |
string |
The display name of the object. |
kind |
DatasetKind |
|
lastActionDateTime |
string |
The time-stamp when the current status was entered. The time stamp is encoded as ISO 8601 date and time format ("YYYY-MM-DDThh:mm:ssZ", see https://en.wikipedia.org/wiki/ISO_8601#Combined_date_and_time_representations). |
links |
DatasetLinks |
|
locale |
string |
The locale of the contained data. |
properties |
DatasetProperties |
|
self |
string |
The location of this entity. |
status |
Status |
DatasetKind
DatasetKind
Name | Type | Description |
---|---|---|
Acoustic |
string |
An acoustic dataset. |
AudioFiles |
string |
An audio files dataset. |
Language |
string |
A language dataset. |
LanguageMarkdown |
string |
A language markdown dataset. |
OutputFormatting |
string |
Dataset that contains rules to customize inverse text normalization, capitalization, reformulation, profanity and also defines tests for dataset validation |
Pronunciation |
string |
A pronunciation dataset. |
DatasetLinks
DatasetLinks
Name | Type | Description |
---|---|---|
commitBlocks |
string |
The location to commit the list of blocks when uploading a dataset using blocks. See operation "Datasets_CommitBlocks" for more details. |
files |
string |
The location to get all files of this entity. See operation "Datasets_ListFiles" for more details. |
listBlocks |
string |
The location to list the already uploaded blocks of this entity when uploading a dataset using blocks. See operation "Datasets_GetBlocks" for more details. |
uploadBlocks |
string |
The location to upload blocks to when uploading a dataset using blocks. See operation "Datasets_UploadBlock" for more details. |
DatasetProperties
DatasetProperties
Name | Type | Default value | Description |
---|---|---|---|
acceptedLineCount |
integer |
The number of lines accepted for this data set. |
|
durationMilliseconds |
integer |
0 |
The total duration in milliseconds of the datasets if it contains audio files. Durations larger than 2^53-1 are not supported to ensure compatibility with JavaScript integers. |
error |
EntityError |
||
rejectedLineCount |
integer |
The number of lines rejected for this data set. |
|
textNormalizationKind |
TextNormalizationKind |
DetailedErrorCode
DetailedErrorCode
Name | Type | Description |
---|---|---|
AudioLengthLimitExceeded |
string |
The audio file is longer than the maximum allowed duration. |
BadChannelConfiguration |
string |
There is a mismatch between audio channels in the data, in the configuration, or the requirements of the application. |
DataImportFailed |
string |
Data import failed. |
DeleteNotAllowed |
string |
Delete not allowed. |
DeployNotAllowed |
string |
Deploy not allowed. |
DeployingFailedModel |
string |
Deploying failed model. |
EmptyAudioFile |
string |
The audio file is empty. |
EmptyRequest |
string |
Empty Request. |
EndpointCannotBeDefault |
string |
Endpoint cannot be default. |
EndpointLoggingNotSupported |
string |
Endpoint logging not supported. |
EndpointNotUpdatable |
string |
Endpoint not updatable. |
EndpointWithoutLogging |
string |
Endpoint without logging. |
ExceededNumberOfRecordingsUris |
string |
Exceeded number of recordings uris. |
FailedDataset |
string |
Failed dataset. |
Forbidden |
string |
Forbidden. |
InUseViolation |
string |
In use violation. |
InaccessibleCustomerStorage |
string |
Inaccessible customer storage. |
InvalidAdaptationMapping |
string |
Invalid adaptation mapping. |
InvalidAudioFormat |
string |
The format of input audio is not supported. |
InvalidBaseModel |
string |
Invalid base model. |
InvalidCallbackUri |
string |
Invalid callback uri. |
InvalidChannelSpecification |
string |
The selection of channels in the transcription request is not supported (e.g., neither 0 nor 1 have been selected.) |
InvalidChannels |
string |
Invalid channels. |
InvalidCollection |
string |
Invalid collection. |
InvalidDataset |
string |
Invalid dataset. |
InvalidDocument |
string |
Invalid Document. |
InvalidDocumentBatch |
string |
Invalid Document Batch. |
InvalidLocale |
string |
Invalid locale. |
InvalidLogDate |
string |
Invalid log date. |
InvalidLogEndTime |
string |
Invalid log end time. |
InvalidLogId |
string |
Invalid log id. |
InvalidLogStartTime |
string |
Invalid log start time. |
InvalidModel |
string |
Invalid model. |
InvalidModelUri |
string |
Invalid model uri. |
InvalidParameter |
string |
Invalid parameter. |
InvalidParameterValue |
string |
Invalid parameter value. |
InvalidPayload |
string |
Invalid payload. |
InvalidPermissions |
string |
Invalid permissions. |
InvalidPrerequisite |
string |
Invalid prerequisite. |
InvalidProductId |
string |
Invalid product id. |
InvalidProject |
string |
Invalid project. |
InvalidProjectKind |
string |
Invalid project kind. |
InvalidRecordingsUri |
string |
Invalid recordings uri. |
InvalidRequestBodyFormat |
string |
Invalid request body format. |
InvalidSasValidityDuration |
string |
Invalid sas validity duration. |
InvalidSkipTokenForLogs |
string |
Invalid skip token for logs. |
InvalidSourceAzureResourceId |
string |
Invalid source Azure resource ID. |
InvalidSubscription |
string |
Invalid subscription. |
InvalidTest |
string |
Invalid test. |
InvalidTimeToLive |
string |
Invalid time to live. |
InvalidTopForLogs |
string |
Invalid top for logs. |
InvalidTranscription |
string |
Invalid transcription. |
InvalidWebHookEventKind |
string |
Invalid web hook event kind. |
MissingInputRecords |
string |
Missing Input Records. |
ModelCopyAuthorizationExpired |
string |
Expired ModelCopyAuthorization. |
ModelDeploymentNotCompleteState |
string |
Model deployment not complete state. |
ModelDeprecated |
string |
Model deprecated. |
ModelExists |
string |
Model exists. |
ModelMismatch |
string |
Model mismatch. |
ModelNotDeployable |
string |
Model not deployable. |
ModelVersionIncorrect |
string |
Model Version Incorrect. |
MultipleLanguagesIdentified |
string |
Language Identification recognized multiple languages. No dominant language could be determined. |
NoLanguageIdentified |
string |
Language Identification did not recognize any language. |
NoUtf8WithBom |
string |
No utf8 with bom. |
OnlyOneOfUrlsOrContainerOrDataset |
string |
Only one of urls or container or dataset. |
ProjectGenderMismatch |
string |
Project gender mismatch. |
QuotaViolation |
string |
Quota violation. |
SingleDefaultEndpoint |
string |
Single default endpoint. |
SkuLimitsExist |
string |
Sku limits exist. |
SubscriptionNotFound |
string |
Subscription not found. |
UnexpectedError |
string |
Unexpected error. |
UnsupportedClassBasedAdaptation |
string |
Unsupported class based adaptation. |
UnsupportedDelta |
string |
Unsupported delta. |
UnsupportedDynamicConfiguration |
string |
Unsupported dynamic configuration. |
UnsupportedFilter |
string |
Unsupported filter. |
UnsupportedLanguageCode |
string |
Unsupported language code. |
UnsupportedOrderBy |
string |
Unsupported order by. |
UnsupportedPagination |
string |
Unsupported pagination. |
UnsupportedTimeRange |
string |
Unsupported time range. |
EntityError
EntityError
Name | Type | Description |
---|---|---|
code |
string |
The code of this error. |
message |
string |
The message for this error. |
Error
Error
Name | Type | Description |
---|---|---|
code |
ErrorCode |
|
details |
Error[] |
Additional supportive details regarding the error and/or expected policies. |
innerError |
InnerError |
|
message |
string |
High level error message. |
target |
string |
The source of the error. For example it would be "documents" or "document id" in case of invalid document. |
ErrorCode
ErrorCode
Name | Type | Description |
---|---|---|
Conflict |
string |
Representing the conflict error code. |
Forbidden |
string |
Representing the forbidden error code. |
InternalCommunicationFailed |
string |
Representing the internal communication failed error code. |
InternalServerError |
string |
Representing the internal server error error code. |
InvalidArgument |
string |
Representing the invalid argument error code. |
InvalidRequest |
string |
Representing the invalid request error code. |
NotAllowed |
string |
Representing the not allowed error code. |
NotFound |
string |
Representing the not found error code. |
PipelineError |
string |
Representing the pipeline error error code. |
ServiceUnavailable |
string |
Representing the service unavailable error code. |
TooManyRequests |
string |
Representing the too many requests error code. |
Unauthorized |
string |
Representing the unauthorized error code. |
UnprocessableEntity |
string |
Representing the unprocessable entity error code. |
UnsupportedMediaType |
string |
Representing the unsupported media type error code. |
InnerError
InnerError
Name | Type | Description |
---|---|---|
code |
DetailedErrorCode |
|
details |
object |
Additional supportive details regarding the error and/or expected policies. |
innerError |
InnerError |
|
message |
string |
High level error message. |
target |
string |
The source of the error. For example it would be "documents" or "document id" in case of invalid document. |
Status
Status
Name | Type | Description |
---|---|---|
Failed |
string |
The long running operation has failed. |
NotStarted |
string |
The long running operation has not yet started. |
Running |
string |
The long running operation is currently processing. |
Succeeded |
string |
The long running operation has successfully completed. |
TextNormalizationKind
TextNormalizationKind
Name | Type | Description |
---|---|---|
Default |
string |
Default text normalization (e.g. '2 to 3' is replaced by 'two to three' in en-US). |
None |
string |
No text normalization will be applied to the input text. This is an override option that should only be used when text is normalized before the upload. |