Partilhar via


AssemblyAI (Preview)

Transcribe and extract data from audio using AssemblyAI's Speech AI.

This connector is available in the following products and regions:

Service Class Regions
Logic Apps Standard All Logic Apps regions except the following:
     -   Azure Government regions
     -   Azure China regions
     -   US Department of Defense (DoD)
Power Automate Premium All Power Automate regions except the following:
     -   US Government (GCC)
     -   US Government (GCC High)
     -   China Cloud operated by 21Vianet
     -   US Department of Defense (DoD)
Power Apps Premium All Power Apps regions except the following:
     -   US Government (GCC)
     -   US Government (GCC High)
     -   China Cloud operated by 21Vianet
     -   US Department of Defense (DoD)
Contact
Name Support
URL https://www.assemblyai.com/docs/
Email support@assemblyai.com
Connector Metadata
Publisher AssemblyAI
Website https://www.assemblyai.com
Privacy policy https://www.assemblyai.com/legal/privacy-policy
Categories AI

With the AssemblyAI Connector, you can use AssemblyAI's models to process audio data by transcribing it with speech recognition models, analyzing it with audio intelligence models, and building generative features on top of it with LLMs.

  • Speech-To-Text including many configurable features, such as speaker diarization, custom spelling, custom vocabulary, etc.
  • Audio Intelligence Models are additional AI models available and configured through the transcription configuration.
  • LeMUR lets you apply various LLM models to your transcripts without the need to build your own RAG infrastructure for very large transcripts.

Prerequisites

You will need the following to proceed:

How to get credentials

You can get an AssemblyAI API key for free by signing up for an account and copying the API key from the dashboard.

Get started with your connector

Follow these steps to transcribe audio using the AssemblyAI connector.

Upload a File

To transcribe an audio file using AssemblyAI, the file needs to be accessible to AssemblyAI. If your audio file is already accessible via a URL, you can use your existing URL.

Otherwise, you can use the Upload a File action to upload a file to AssemblyAI. You will get back a URL for your file which can only be used to transcribe using your API key. Once you transcribe the file, the file will be removed from AssemblyAI's servers.

Transcribe Audio

To transcribe your audio, configure the Audio URL parameter using your audio file URL. Then, configure the additional parameters to enable more Speech Recognition features and Audio Intelligence models.

The result of the Transcribe Audio action is a queued transcript which will start being processed immediately. To get the completed transcript, you have two options:

  1. Handle the Transcript Ready Webhook
  2. Poll the Transcript Status

Handle the Transcript Ready Webhook

If you don't want to handle the webhook using Logic Apps or Power Automate, configure the Webhook URL parameter in your Transcribe Audio action, and implement your webhook following AssemblyAI's webhook documentation.

To handle the webhook using Logic Apps or Power Automate, follow these steps:

  1. Create a separate Logic App or Power Automate Flow

  2. Configure When an HTTP request is received as the trigger:

    • Set Who Can Trigger The Flow? to Anyone
    • Set Request Body JSON Schema to:
      {
        "type": "object",
        "properties": {
          "transcript_id": {
            "type": "string"
          },
          "status": {
            "type": "string"
          }
        }
      }
      
    • Set Method to POST
  3. Add an AssemblyAI Get Transcript action, passing in the transcript_id from the trigger to the Transcript ID parameter.

  4. Before doing anything else, you should check whether the Status is completed or error. Add a Condition action that checks if the Status from the Get Transcript output is error:

    • In the True branch, add a Terminate action
      • Set the Status to Failed
      • Set the Code to Transcript Error
      • Pass the Error from the Get Transcript output to the Message parameter.
    • You can leave the False branch empty.

    Now you can add any action after the Condition knowing the transcript status is completed, and you can retrieve any of the output properties of the Get Transcript action.

  5. Save your Logic App or Flow. The HTTP URL will be generated for the When an HTTP request is received trigger. Copy the HTTP URL and head back to your original Logic App or Flow.

  6. In your original Logic App or Flow, update the Transcribe Audio action. Paste the HTTP URL you copied previously into the Webhook URL parameter, and save.

When the transcript status becomes completed or error, AssemblyAI will send an HTTP POST request to the webhook URL, which will be handled by your other Logic App or Flow.

As an alternative to using the webhook, you can poll the transcript status as explained in the next section.

Poll the Transcript Status

You can poll the transcript status using the following steps:

  • Add an Initialize variable action

    • Set Name to transcript_status
    • Set Type to String
    • Store the Status from the Transcribe Audio output into the Value parameter
  • Add a Do until action

    • Configure the Loop Until parameter with the following Fx code:
      or(equals(variables('transcript_status'), 'completed'), equals(variables('transcript_status'), 'error'))
      
      This code checks whether the transcript_status variable is completed or error.
    • Configure the Count parameter to 86400
    • Configure the Timeout parameter to PT24H

    Inside the Do until action, add the following actions:

    • Add a Delay action that waits for one second
    • Add a Get Transcript action and pass the ID from the Transcribe Audio output to the Transcript ID parameter.
    • Add a Set variable action
      • Set Name to transcript_status
      • Pass the Status of the Get Transcript output to the Value parameter

    The Do until loop will continue until the transcript is completed, or an error occurred.

  • Add another Get Transcript action, like before, but add it after the Do until loop so its output becomes available outside the scope of the Do until action.

Before doing anything else, you should check whether the transcript Status is completed or error. Add a Condition action that checks if the transcript_status is error:

  • In the True branch, add a Terminate action
    • Set Status to Failed
    • Set Code to Transcript Error
    • Pass the Error from the Get Transcript output to the Message parameter.
  • You can leave the False branch empty.

Now you can add any action after the Condition knowing the transcript status is completed, and you can retrieve any of the output properties of the Get Transcript action.

Add more actions

Now that you have a completed transcription, you can use many other actions passing in the ID of the transcript, such as

  • Get Sentences of Transcript
  • Get Paragraphs of Transcript
  • Get Subtitles of Transcript
  • Get Redacted Audio
  • Search Transcript for Words
  • Run a Task using LeMUR

Known issues and limitations

No known issues currently. We don't support Streaming Speech-To-Text (real-time) as it is not possible using Custom Connectors.

Common errors and remedies

You can find more information about errors in the AssemblyAI documentation.

FAQ

You can find frequently asked questions in our documentation.

Creating a connection

The connector supports the following authentication types:

Default Parameters for creating connection. All regions Not shareable

Default

Applicable: All regions

Parameters for creating connection.

This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.

Name Type Description Required
AssemblyAI API Key securestring The AssemblyAI API Key to authenticate the AssemblyAI API. True

Throttling Limits

Name Calls Renewal Period
API calls per connection 100 60 seconds

Actions

Delete Transcript

Delete the transcript. Deleting does not delete the resource itself, but removes the data from the resource and marks it as deleted.

Get Paragraphs in Transcript

Get the transcript split by paragraphs. The API will attempt to semantically segment your transcript into paragraphs to create more reader-friendly transcripts.

Get Redacted Audio

Retrieve the redacted audio object containing the status and URL to the redacted audio.

Get Sentences in Transcript

Get the transcript split by sentences. The API will attempt to semantically segment the transcript into sentences to create more reader-friendly transcripts.

Get Subtitles for Transcript

Export your transcript in SRT or VTT format to use with a video player for subtitles and closed captions.

Get Transcript

Get the transcript resource. The transcript is ready when the "status" is "completed".

List Transcripts

Retrieve a list of transcripts you created. Transcripts are sorted from newest to oldest. The previous URL always points to a page with older transcripts.

Purge LeMUR Request Data

Delete the data for a previously submitted LeMUR request. The LLM response data, as well as any context provided in the original request will be removed.

Retrieve LeMUR Response

Retrieve a LeMUR response that was previously generated.

Run a Task Using LeMUR

Use the LeMUR task endpoint to input your own LLM prompt.

Search Words in Transcript

Search through the transcript for keywords. You can search for individual words, numbers, or phrases containing up to five words or numbers.

Transcribe Audio

Create a transcript from a media file that is accessible via a URL.

Upload a Media File

Upload a media file to AssemblyAI's servers.

Delete Transcript

Delete the transcript. Deleting does not delete the resource itself, but removes the data from the resource and marks it as deleted.

Parameters

Name Key Required Type Description
Transcript ID
transcript_id True string

ID of the transcript

Returns

A transcript object

Body
Transcript

Get Paragraphs in Transcript

Get the transcript split by paragraphs. The API will attempt to semantically segment your transcript into paragraphs to create more reader-friendly transcripts.

Parameters

Name Key Required Type Description
Transcript ID
transcript_id True string

ID of the transcript

Returns

Get Redacted Audio

Retrieve the redacted audio object containing the status and URL to the redacted audio.

Parameters

Name Key Required Type Description
Transcript ID
transcript_id True string

ID of the transcript

Returns

Get Sentences in Transcript

Get the transcript split by sentences. The API will attempt to semantically segment the transcript into sentences to create more reader-friendly transcripts.

Parameters

Name Key Required Type Description
Transcript ID
transcript_id True string

ID of the transcript

Returns

Get Subtitles for Transcript

Export your transcript in SRT or VTT format to use with a video player for subtitles and closed captions.

Parameters

Name Key Required Type Description
Transcript ID
transcript_id True string

ID of the transcript

Subtitle Format
subtitle_format True string

Format of the subtitles

Number of Characters per Caption
chars_per_caption integer

The maximum number of characters per caption

Returns

response
string

Get Transcript

Get the transcript resource. The transcript is ready when the "status" is "completed".

Parameters

Name Key Required Type Description
Transcript ID
transcript_id True string

ID of the transcript

Returns

A transcript object

Body
Transcript

List Transcripts

Retrieve a list of transcripts you created. Transcripts are sorted from newest to oldest. The previous URL always points to a page with older transcripts.

Parameters

Name Key Required Type Description
Limit
limit integer

Maximum amount of transcripts to retrieve

Status
status string

The status of your transcript. Possible values are queued, processing, completed, or error.

Created On
created_on date

Only get transcripts created on this date

Before ID
before_id uuid

Get transcripts that were created before this transcript ID

After ID
after_id uuid

Get transcripts that were created after this transcript ID

Throttled Only
throttled_only boolean

Only get throttled transcripts, overrides the status filter

Returns

A list of transcripts. Transcripts are sorted from newest to oldest. The previous URL always points to a page with older transcripts.

Purge LeMUR Request Data

Delete the data for a previously submitted LeMUR request. The LLM response data, as well as any context provided in the original request will be removed.

Parameters

Name Key Required Type Description
LeMUR Request ID
request_id True string

The ID of the LeMUR request whose data you want to delete. This would be found in the response of the original request.

Returns

Retrieve LeMUR Response

Retrieve a LeMUR response that was previously generated.

Parameters

Name Key Required Type Description
LeMUR Request ID
request_id True string

The ID of the LeMUR request you previously made. This would be found in the response of the original request.

Returns

Run a Task Using LeMUR

Use the LeMUR task endpoint to input your own LLM prompt.

Parameters

Name Key Required Type Description
Prompt
prompt True string

Your text to prompt the model to produce a desired output, including any context you want to pass into the model.

Transcript IDs
transcript_ids array of uuid

A list of completed transcripts with text. Up to a maximum of 100 files or 100 hours, whichever is lower. Use either transcript_ids or input_text as input into LeMUR.

Input Text
input_text string

Custom formatted transcript data. Maximum size is the context limit of the selected model, which defaults to 100000. Use either transcript_ids or input_text as input into LeMUR.

Context
context string

Context to provide the model. This can be a string or a free-form JSON value.

Final Model
final_model string

The model that is used for the final prompt after compression is performed.

Maximum Output Size
max_output_size integer

Max output size in tokens, up to 4000

Temperature
temperature float

The temperature to use for the model. Higher values result in answers that are more creative, lower values are more conservative. Can be any value between 0.0 and 1.0 inclusive.

Returns

Search Words in Transcript

Search through the transcript for keywords. You can search for individual words, numbers, or phrases containing up to five words or numbers.

Parameters

Name Key Required Type Description
Transcript ID
transcript_id True string

ID of the transcript

Words
words True array

Keywords to search for

Returns

Transcribe Audio

Create a transcript from a media file that is accessible via a URL.

Parameters

Name Key Required Type Description
Audio URL
audio_url True string

The URL of the audio or video file to transcribe.

Language Code
language_code string

The language of your audio file. Possible values are found in Supported Languages. The default value is 'en_us'.

Language Detection
language_detection boolean

Enable Automatic language detection, either true or false.

Speech Model
speech_model string

The speech model to use for the transcription.

Punctuate
punctuate boolean

Enable Automatic Punctuation, can be true or false

Format Text
format_text boolean

Enable Text Formatting, can be true or false

Disfluencies
disfluencies boolean

Transcribe Filler Words, like "umm", in your media file; can be true or false

Dual Channel
dual_channel boolean

Enable Dual Channel transcription, can be true or false.

Webhook URL
webhook_url string

The URL to which we send webhook requests. We sends two different types of webhook requests. One request when a transcript is completed or failed, and one request when the redacted audio is ready if redact_pii_audio is enabled.

Webhook Auth Header Name
webhook_auth_header_name string

The header name to be sent with the transcript completed or failed webhook requests

Webhook Auth Header Value
webhook_auth_header_value string

The header value to send back with the transcript completed or failed webhook requests for added security

Key Phrases
auto_highlights boolean

Enable Key Phrases, either true or false

Audio Start From
audio_start_from integer

The point in time, in milliseconds, to begin transcribing in your media file

Audio End At
audio_end_at integer

The point in time, in milliseconds, to stop transcribing in your media file

Word Boost
word_boost array of string

The list of custom vocabulary to boost transcription probability for

Word Boost Level
boost_param string

How much to boost specified words

Filter Profanity
filter_profanity boolean

Filter profanity from the transcribed text, can be true or false

Redact PII
redact_pii boolean

Redact PII from the transcribed text using the Redact PII model, can be true or false

Redact PII Audio
redact_pii_audio boolean

Generate a copy of the original media file with spoken PII "beeped" out, can be true or false. See PII redaction for more details.

Redact PII Audio Quality
redact_pii_audio_quality string

Controls the filetype of the audio created by redact_pii_audio. Currently supports mp3 (default) and wav. See PII redaction for more details.

Redact PII Policies
redact_pii_policies array of string

The list of PII Redaction policies to enable. See PII redaction for more details.

Redact PII Substitution
redact_pii_sub string

The replacement logic for detected PII, can be "entity_name" or "hash". See PII redaction for more details.

Speaker Labels
speaker_labels boolean

Enable Speaker diarization, can be true or false

Speakers Expected
speakers_expected integer

Tells the speaker label model how many speakers it should attempt to identify, up to 10. See Speaker diarization for more details.

Content Moderation
content_safety boolean

Enable Content Moderation, can be true or false

Content Moderation Confidence
content_safety_confidence integer

The confidence threshold for the Content Moderation model. Values must be between 25 and 100.

Topic Detection
iab_categories boolean

Enable Topic Detection, can be true or false

From
from True array of string

Words or phrases to replace

To
to True string

Word or phrase to replace with

Sentiment Analysis
sentiment_analysis boolean

Enable Sentiment Analysis, can be true or false

Auto Chapters
auto_chapters boolean

Enable Auto Chapters, can be true or false

Entity Detection
entity_detection boolean

Enable Entity Detection, can be true or false

Speech Threshold
speech_threshold float

Reject audio files that contain less than this fraction of speech. Valid values are in the range [0, 1] inclusive.

Enable Summarization
summarization boolean

Enable Summarization, can be true or false

Summary Model
summary_model string

The model to summarize the transcript

Summary Type
summary_type string

The type of summary

Enable Custom Topics
custom_topics boolean

Enable custom topics, either true or false

Custom Topics
topics array of string

The list of custom topics

Returns

A transcript object

Body
Transcript

Upload a Media File

Upload a media file to AssemblyAI's servers.

Parameters

Name Key Required Type Description
File Content
file True binary

The file to upload.

Returns

Definitions

RedactedAudioResponse

Name Path Type Description
Status
status string

The status of the redacted audio

Redacted Audio URL
redacted_audio_url string

The URL of the redacted audio file

WordSearchResponse

Name Path Type Description
Transcript ID
id uuid

The ID of the transcript

Total Count of Matches
total_count integer

The total count of all matched instances. For e.g., word 1 matched 2 times, and word 2 matched 3 times, total_count will equal 5.

Matches
matches array of object

The matches of the search

Text
matches.text string

The matched word

Count
matches.count integer

The total amount of times the word is in the transcript

Timestamps
matches.timestamps array of array

An array of timestamps

Timestamp
matches.timestamps array of integer

An array of timestamps structured as [start_time, end_time] in milliseconds

Indexes
matches.indexes array of integer

An array of all index locations for that word within the words array of the completed transcript

Transcript

A transcript object

Name Path Type Description
ID
id uuid

The unique identifier of your transcript

Audio URL
audio_url string

The URL of the media that was transcribed

Status
status string

The status of your transcript. Possible values are queued, processing, completed, or error.

Language Code
language_code string

The language of your audio file. Possible values are found in Supported Languages. The default value is 'en_us'.

Language Detection
language_detection boolean

Whether Automatic language detection is enabled, either true or false

Speech Model
speech_model string

The speech model to use for the transcription.

Text
text string

The textual transcript of your media file

Words
words array of object

An array of temporally-sequential word objects, one for each word in the transcript. See Speech recognition for more information.

Confidence
words.confidence double
Start
words.start integer
End
words.end integer
Text
words.text string
Speaker
words.speaker string

The speaker of the sentence if Speaker Diarization is enabled, else null

Utterances
utterances array of object

When dual_channel or speaker_labels is enabled, a list of turn-by-turn utterance objects. See Speaker diarization for more information.

Confidence
utterances.confidence double

The confidence score for the transcript of this utterance

Start
utterances.start integer

The starting time, in milliseconds, of the utterance in the audio file

End
utterances.end integer

The ending time, in milliseconds, of the utterance in the audio file

Text
utterances.text string

The text for this utterance

Words
utterances.words array of object

The words in the utterance.

Confidence
utterances.words.confidence double
Start
utterances.words.start integer
End
utterances.words.end integer
Text
utterances.words.text string
Speaker
utterances.words.speaker string

The speaker of the sentence if Speaker Diarization is enabled, else null

Speaker
utterances.speaker string

The speaker of this utterance, where each speaker is assigned a sequential capital letter - e.g. "A" for Speaker A, "B" for Speaker B, etc.

Confidence
confidence double

The confidence score for the transcript, between 0.0 (low confidence) and 1.0 (high confidence)

Audio Duration
audio_duration integer

The duration of this transcript object's media file, in seconds

Punctuate
punctuate boolean

Whether Automatic Punctuation is enabled, either true or false

Format Text
format_text boolean

Whether Text Formatting is enabled, either true or false

Disfluencies
disfluencies boolean

Transcribe Filler Words, like "umm", in your media file; can be true or false

Dual Channel
dual_channel boolean

Whether Dual channel transcription was enabled in the transcription request, either true or false

Webhook URL
webhook_url string

The URL to which we send webhook requests. We sends two different types of webhook requests. One request when a transcript is completed or failed, and one request when the redacted audio is ready if redact_pii_audio is enabled.

Webhook HTTP Status Code
webhook_status_code integer

The status code we received from your server when delivering the transcript completed or failed webhook request, if a webhook URL was provided

Webhook Auth Enabled
webhook_auth boolean

Whether webhook authentication details were provided

Webhook Auth Header Name
webhook_auth_header_name string

The header name to be sent with the transcript completed or failed webhook requests

Speed Boost
speed_boost boolean

Whether speed boost is enabled

Key Phrases
auto_highlights boolean

Whether Key Phrases is enabled, either true or false

Status
auto_highlights_result.status string

Either success, or unavailable in the rare case that the model failed

Results
auto_highlights_result.results array of object

A temporally-sequential array of Key Phrases

Count
auto_highlights_result.results.count integer

The total number of times the key phrase appears in the audio file

Rank
auto_highlights_result.results.rank float

The total relevancy to the overall audio file of this key phrase - a greater number means more relevant

Text
auto_highlights_result.results.text string

The text itself of the key phrase

Timestamps
auto_highlights_result.results.timestamps array of object

The timestamp of the of the key phrase

Start
auto_highlights_result.results.timestamps.start integer

The start time in milliseconds

End
auto_highlights_result.results.timestamps.end integer

The end time in milliseconds

Audio Start From
audio_start_from integer

The point in time, in milliseconds, in the file at which the transcription was started

Audio End At
audio_end_at integer

The point in time, in milliseconds, in the file at which the transcription was terminated

Word Boost
word_boost array of string

The list of custom vocabulary to boost transcription probability for

Boost
boost_param string

The word boost parameter value

Filter Profanity
filter_profanity boolean

Whether Profanity Filtering is enabled, either true or false

Redact PII
redact_pii boolean

Whether PII Redaction is enabled, either true or false

Redact PII Audio
redact_pii_audio boolean

Whether a redacted version of the audio file was generated, either true or false. See PII redaction for more information.

Redact PII Audio Quality
redact_pii_audio_quality string

Controls the filetype of the audio created by redact_pii_audio. Currently supports mp3 (default) and wav. See PII redaction for more details.

Redact PII Policies
redact_pii_policies array of string

The list of PII Redaction policies that were enabled, if PII Redaction is enabled. See PII redaction for more information.

Redact PII Substitution
redact_pii_sub string

The replacement logic for detected PII, can be "entity_name" or "hash". See PII redaction for more details.

Speaker Labels
speaker_labels boolean

Whether Speaker diarization is enabled, can be true or false

Speakers Expected
speakers_expected integer

Tell the speaker label model how many speakers it should attempt to identify, up to 10. See Speaker diarization for more details.

Content Moderation
content_safety boolean

Whether Content Moderation is enabled, can be true or false

Status
content_safety_labels.status string

Either success, or unavailable in the rare case that the model failed

Results
content_safety_labels.results array of object
Text
content_safety_labels.results.text string

The transcript of the section flagged by the Content Moderation model

Labels
content_safety_labels.results.labels array of object

An array of safety labels, one per sensitive topic that was detected in the section

Label
content_safety_labels.results.labels.label string

The label of the sensitive topic

Confidence
content_safety_labels.results.labels.confidence double

The confidence score for the topic being discussed, from 0 to 1

Severity
content_safety_labels.results.labels.severity double

How severely the topic is discussed in the section, from 0 to 1

Sentence Index Start
content_safety_labels.results.sentences_idx_start integer

The sentence index at which the section begins

Sentence Index End
content_safety_labels.results.sentences_idx_end integer

The sentence index at which the section ends

Start
content_safety_labels.results.timestamp.start integer

The start time in milliseconds

End
content_safety_labels.results.timestamp.end integer

The end time in milliseconds

Summary
content_safety_labels.summary object

A summary of the Content Moderation confidence results for the entire audio file

Severity Score Summary
content_safety_labels.severity_score_summary object

A summary of the Content Moderation severity results for the entire audio file

Topic Detection
iab_categories boolean

Whether Topic Detection is enabled, can be true or false

Status
iab_categories_result.status string

Either success, or unavailable in the rare case that the model failed

Results
iab_categories_result.results array of object

An array of results for the Topic Detection model

Text
iab_categories_result.results.text string

The text in the transcript in which a detected topic occurs

Labels
iab_categories_result.results.labels array of object
Relevance
iab_categories_result.results.labels.relevance double

How relevant the detected topic is of a detected topic

Label
iab_categories_result.results.labels.label string

The IAB taxonomical label for the label of the detected topic, where > denotes supertopic/subtopic relationship

Start
iab_categories_result.results.timestamp.start integer

The start time in milliseconds

End
iab_categories_result.results.timestamp.end integer

The end time in milliseconds

Summary
iab_categories_result.summary object

The overall relevance of topic to the entire audio file

Custom Spellings
custom_spelling array of object

Customize how words are spelled and formatted using to and from values

From
custom_spelling.from array of string

Words or phrases to replace

To
custom_spelling.to string

Word or phrase to replace with

Auto Chapters Enabled
auto_chapters boolean

Whether Auto Chapters is enabled, can be true or false

Chapters
chapters array of object

An array of temporally sequential chapters for the audio file

Gist
chapters.gist string

An ultra-short summary (just a few words) of the content spoken in the chapter

Headline
chapters.headline string

A single sentence summary of the content spoken during the chapter

Summary
chapters.summary string

A one paragraph summary of the content spoken during the chapter

Start
chapters.start integer

The starting time, in milliseconds, for the chapter

End
chapters.end integer

The starting time, in milliseconds, for the chapter

Summarization Enabled
summarization boolean

Whether Summarization is enabled, either true or false

Summary Type
summary_type string

The type of summary generated, if Summarization is enabled

Summary Model
summary_model string

The Summarization model used to generate the summary, if Summarization is enabled

Summary
summary string

The generated summary of the media file, if Summarization is enabled

Custom Topics Enabled
custom_topics boolean

Whether custom topics is enabled, either true or false

Topics
topics array of string

The list of custom topics provided if custom topics is enabled

Sentiment Analysis
sentiment_analysis boolean

Whether Sentiment Analysis is enabled, can be true or false

Sentiment Analysis Results
sentiment_analysis_results array of object

An array of results for the Sentiment Analysis model, if it is enabled. See Sentiment Analysis for more information.

Text
sentiment_analysis_results.text string

The transcript of the sentence

Start
sentiment_analysis_results.start integer

The starting time, in milliseconds, of the sentence

End
sentiment_analysis_results.end integer

The ending time, in milliseconds, of the sentence

Sentiment
sentiment_analysis_results.sentiment

The detected sentiment for the sentence, one of POSITIVE, NEUTRAL, NEGATIVE

Confidence
sentiment_analysis_results.confidence double

The confidence score for the detected sentiment of the sentence, from 0 to 1

Speaker
sentiment_analysis_results.speaker string

The speaker of the sentence if Speaker Diarization is enabled, else null

Entity Detection
entity_detection boolean

Whether Entity Detection is enabled, can be true or false

Entities
entities array of object

An array of results for the Entity Detection model, if it is enabled. See Entity detection for more information.

Entity Type
entities.entity_type string

The type of entity for the detected entity

Text
entities.text string

The text for the detected entity

Start
entities.start integer

The starting time, in milliseconds, at which the detected entity appears in the audio file

End
entities.end integer

The ending time, in milliseconds, for the detected entity in the audio file

Speech Threshold
speech_threshold float

Defaults to null. Reject audio files that contain less than this fraction of speech. Valid values are in the range [0, 1] inclusive.

Throttled
throttled boolean

True while a request is throttled and false when a request is no longer throttled

Error
error string

Error message of why the transcript failed

Language Model
language_model string

The language model that was used for the transcript

Acoustic Model
acoustic_model string

The acoustic model that was used for the transcript

SentencesResponse

Name Path Type Description
Transcript ID
id uuid
Confidence
confidence double
Audio Duration
audio_duration number
Sentences
sentences array of object
Text
sentences.text string
Start
sentences.start integer
End
sentences.end integer
Confidence
sentences.confidence double
Words
sentences.words array of object
Confidence
sentences.words.confidence double
Start
sentences.words.start integer
End
sentences.words.end integer
Text
sentences.words.text string
Speaker
sentences.words.speaker string

The speaker of the sentence if Speaker Diarization is enabled, else null

Speaker
sentences.speaker string

The speaker of the sentence if Speaker Diarization is enabled, else null

ParagraphsResponse

Name Path Type Description
Transcript ID
id uuid
Confidence
confidence double
Audio Duration
audio_duration number
Paragraphs
paragraphs array of object
Text
paragraphs.text string
Start
paragraphs.start integer
End
paragraphs.end integer
Confidence
paragraphs.confidence double
Words
paragraphs.words array of object
Confidence
paragraphs.words.confidence double
Start
paragraphs.words.start integer
End
paragraphs.words.end integer
Text
paragraphs.words.text string
Speaker
paragraphs.words.speaker string

The speaker of the sentence if Speaker Diarization is enabled, else null

Speaker
paragraphs.speaker string

The speaker of the sentence if Speaker Diarization is enabled, else null

TranscriptList

A list of transcripts. Transcripts are sorted from newest to oldest. The previous URL always points to a page with older transcripts.

Name Path Type Description
Limit
page_details.limit integer

The number of results this page is limited to

Result Count
page_details.result_count integer

The actual number of results in the page

Current URL
page_details.current_url string

The URL used to retrieve the current page of transcripts

Previous URL
page_details.prev_url string

The URL to the next page of transcripts. The previous URL always points to a page with older transcripts.

Next URL
page_details.next_url string

The URL to the next page of transcripts. The next URL always points to a page with newer transcripts.

Transcripts
transcripts array of object
ID
transcripts.id uuid
Resource URL
transcripts.resource_url string
Status
transcripts.status string

The status of your transcript. Possible values are queued, processing, completed, or error.

Created
transcripts.created string
Completed
transcripts.completed string
Audio URL
transcripts.audio_url string
Error
transcripts.error string

Error message of why the transcript failed

UploadedFile

Name Path Type Description
Uploaded File URL
upload_url string

A URL that points to your audio file, accessible only by AssemblyAI's servers

PurgeLemurRequestDataResponse

Name Path Type Description
Purge Request ID
request_id uuid

The ID of the deletion request of the LeMUR request

LeMUR Request ID to Purge
request_id_to_purge uuid

The ID of the LeMUR request to purge the data for

Deleted
deleted boolean

Whether the request data was deleted

LemurTaskResponse

Name Path Type Description
Response
response string

The response generated by LeMUR.

LeMUR Request ID
request_id uuid

The ID of the LeMUR request

Input Tokens
usage.input_tokens integer

The number of input tokens used by the model

Output Tokens
usage.output_tokens integer

The number of output tokens generated by the model

LemurResponse

Name Path Type Description
Response
response string

The response generated by LeMUR.

LeMUR Request ID
request_id uuid

The ID of the LeMUR request

Input Tokens
usage.input_tokens integer

The number of input tokens used by the model

Output Tokens
usage.output_tokens integer

The number of output tokens generated by the model

string

This is the basic data type 'string'.