Upload large files to SharePoint through python office365 API result in broken file

Question

Hi all,

Recently I am using the python office365 API to upload files to SharePoint. I have a single CSV file that has a file size of 1GB.

Sometimes, a 503 Server Error occurs during the upload process. Other times, the upload seems successful without raising any errors, but when trying to read or download the file, it shows a file size of 0kB. Therefore, I am wonder if there is something wrong with my code.

def upload_file_to_sharepoint(self, local_relative_path, subfolder, filename, chunk_size=500000):
	try:
		local_full_path = os.getcwd() + "\" + local_relative_path
		relative_url = self.data_folder + '/' + subfolder
		folder = self.ctx.web.get_folder_by_server_relative_url(relative_url)
		with open(local_full_path, "rb") as file_to_upload:
			folder.files.create_upload_session(
				file=file_to_upload, chunk_size=chunk_size, file_name=filename
			).execute_query()
		self.logger.info(f"{filename} has been uploaded successfully!")
	except Exception as e:
		self.logger.error(e)

Any assistance or insights into resolving this problem would be greatly appreciated. Thanks in advance.

Accepted Answer

Hi @BrianC,

You will need to implement a function to recursively upload chunks using the Office365-rest-python-client to form the basis for the connection. Please refer to following code

def sharepoint_upload_chunked(blob_path: Path, filename: str, sharepoint_folder: str, chunk_size: int):
    '''
        input:
        blob_path : path to binary file to upload
        filename : filename you want to name the upload
        sharepoint_folder : the name of the folder you want to upload to
        chunk_size : size of the chunks in bytes. Used in recursion.
    '''
    #log in
    ctx = ClientContext(URL).with_credentials(
                        ClientCredential(CLIENT_ID, CLIENT_SECRET))
    with open(blob_path, 'rb') as f:
        first_chunk = True
        size_previous_chunk = 0
        offset = 0
        filesize = os.path.getsize(blob_path)
        URL = https://myorg.sharepoint.com/sites/myapp
        #take url after "sites". You are already logged in to myorg.sharepoint.com via ctx (context)
        file_url = URL[29:] + f"/{sharepoint_folder}" + filename
        sharepoint_folder_long = url[29:] + f"/{sharepoint_folder}"
        #each upload needs a guid. You will reference this guid as you upload.
        upload_id = uuid.uuid4()
        #consume the data in chunks.
        while chunk := f.read(chunk_size):
            #see GitHub for progress bar code. This is for large uploads so it really helps.
            progressbar(offset, filesize, 30,'■')
            #start upload
            if first_chunk:
                #you need to initialize an empty file to upload into.
                print("adding empty file")
                endpoint_url = f"{url}/_api/web/getfolderbyserverrelativeurl('{sharepoint_folder_long}')/files/add(url='{filename}', overwrite=true)"
                upload_data(ctx, endpoint_url, bytes())
                endpoint_url = f"{url}/_api/web/getfilebyserverrelativeurl('{file_url}')/startupload(uploadID=guid'{upload_id}')"
                response = upload_data(ctx, endpoint_url, chunk)
                first_chunk=False
            #Finish upload. if the current chunk is smaller than the previous chunk, it must be the last chunk. 
            elif len(chunk) < size_previous_chunk:
                endpoint_url = f"{url}/_api/web/getfilebyserverrelativeurl('{file_url}')/finishupload(uploadID=guid'{upload_id}',fileOffset={offset})"
                progressbar(filesize, filesize, 30,'■')
                response = upload_data(ctx, endpoint_url, chunk)
                print(response)
            #continue upload.
            else :
                #continue to consume the chunks and upload.
                endpoint_url = f"{url}/_api/web/getfilebyserverrelativeurl('{file_url}')/continueupload(uploadID=guid'{upload_id}',fileOffset={offset})"
                response = upload_data(ctx, endpoint_url, chunk)
            #length in characters, not in bytes)
            size_previous_chunk = len(chunk)
            offset = offset + size_previous_chunk

You could get the full code in the following document

https://github.com/SteveScott/office-365-python-rest-client-chunked-upload-example/blob/main/sharepoint_upload.py

Share via

Upload large files to SharePoint through python office365 API result in broken file

0 additional answers

Your answer