@Bao, Jeremy (Cognizant) Thanks for your patience on this, I have checked with internal team and sharing the below.
The skills in the index are not ordered in execution by the number but by the inputs and outputs. The embedding skill runs after the split skill since it has outputs form the split skill. More on how skillset execution works: Skillset concepts - Azure AI Search | Microsoft Learn.
Secondly, why is performance so bad when passing raw data files into this? I am working on applications where we need a chatbot using RAG on structured data. When I use a script I made to split the files based on their structure before uploading the individual chunks as separate files, I can get decent results. When I simply shove the raw files into an index using this "Import and Vectorize" feature, I get terrible result.
Based on the shared information, we understand that splitting the data based on structure you get good results, so in this case, for your use case, structure is important. So split skill wouldn't suffice since this is based on fixed-size chunking. If this is the case, you need to consider using a custom skill to split the data in the way you need.
Currently, we don't have any native chunking skill that preserves the doc structure. This is in our roadmap but won't be available in the next few months.
You could use something similar to the script that has from the custom skill to do this: Custom Web API skill in skillsets - Azure AI Search | Microsoft Learn, or
You could consider also AI Document intelligence that preserves structure too: Build a Document Intelligence custom skill for Azure AI Search - Training | Microsoft Learn
Hope this helps, let me know if you any question on this.