Hi Kelvin Shee
It looks like you're encountering a UnicodeDecodeError while trying to process data in Azure OpenAI.
Here are some steps:
- Verify that the data you are attempting to decode is encoded in gb2312. If the data includes characters beyond the gb2312 encoding, consider using an alternative encoding like GB18030, which encompasses gb2312 and supports additional characters.
- If you think the data may include characters not supported by the gb2312 encoding, consider switching to GB18030
with open('yourfile.txt', 'r', encoding='GB18030') as file:
content = file.read()
Azure OpenAI On Your Data supports the following file types:
- .txt
- .md
- .html
- .docx
- .pptx
- .pdf ,
Kindly refer below Link: https://github.com/microsoft/sample-app-aoai-chatGPT/tree/main/scripts#optional-crack-pdfs-to-text For preprocessing longer text or mixed datatype.
Thank You.