How to fix ParquetJavaInvocationException ParquetJavaInvocationException when writing parquet file in ADLS using ADF
Hi,
I want to copy tables from an on premise sql server to my azure data lake and write the files in parquet fornat. I have installed the JRE on the machine which hosts the self-hosted IR.
When I create a dataset with parquet as format and select schema = none I am able to write files in parquet. However when I try to view the content of the parquet file using parquet viewer I get a corrupted file returned. I have also tried to take the parquet file as source and convert it to csv in sink in ADF but was running into failures too.
When I try to create a new dataset and set the schema to extract from connection/store I get the following error message:
An error occurred
when invoking java, message: java.io.IOException:Error reading summaries
total entry:6
org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallelUsingSummaryFiles(ParquetFileReader.java:190)
org.apache.parquet.hadoop.ParquetReader.<init>(ParquetReader.java:112)
org.apache.parquet.hadoop.ParquetReader.<init>(ParquetReader.java:45)
org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:202)
com.microsoft.datatransfer.bridge.parquet.ParquetBatchReaderBridge.open(ParquetBatchReaderBridge.java:62)
com.microsoft.datatransfer.bridge.parquet.ParquetFileBridge.createReader(ParquetFileBridge.java:22)
java.util.concurrent.ExecutionException:java.lang.ExceptionInInitializerError
total entry:9
java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
org.apache.parquet.hadoop.ParquetFileReader.runAllInParallel(ParquetFileReader.java:227)
org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallelUsingSummaryFiles(ParquetFileReader.java:185)
org.apache.parquet.hadoop.ParquetReader.<init>(ParquetReader.java:112)
org.apache.parquet.hadoop.ParquetReader.<init>(ParquetReader.java:45)
org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:202)
com.microsoft.datatransfer.bridge.parquet.ParquetBatchReaderBridge.open(ParquetBatchReaderBridge.java:62)
com.microsoft.datatransfer.bridge.parquet.ParquetFileBridge.createReader(ParquetFileBridge.java:22)
java.lang.ExceptionInInitializerError:null
total entry:24
org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)
org.apache.hadoop.security.Groups.<init>(Groups.java:86)
org.apache.hadoop.security.Groups.<init>(Groups.java:66)
org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:248)
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:763)
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:748)
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:621)
org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2753)
org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2745)
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2611)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:354)
org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
org.apache.parquet.hadoop.ParquetFileReader.readSummaryMetadata(ParquetFileReader.java:360)
org.apache.parquet.hadoop.ParquetFileReader$1.call(ParquetFileReader.java:158)
org.apache.parquet.hadoop.ParquetFileReader$1.call(ParquetFileReader.java:155)
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
java.base/java.lang.Thread.run(Thread.java:1583)
java.lang.StringIndexOutOfBoundsException:Range [0, 3) out of bounds for length 2
total entry:34
java.base/jdk.internal.util.Preconditions$1.apply(Preconditions.java:55)
java.base/jdk.internal.util.Preconditions$1.apply(Preconditions.java:52)
java.base/jdk.internal.util.Preconditions$4.apply(Preconditions.java:213)
java.base/jdk.internal.util.Preconditions$4.apply(Preconditions.java:210)
java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:98)
java.base/jdk.internal.util.Preconditions.outOfBoundsCheckFromToIndex(Preconditions.java:112)
java.base/jdk.internal.util.Preconditions.checkFromToIndex(Preconditions.java:349)
java.base/java.lang.String.checkBoundsBeginEnd(String.java:4861)
java.base/java.lang.String.substring(String.java:2830)
org.apache.hadoop.util.Shell.<clinit>(Shell.java:49)
org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)
org.apache.hadoop.security.Groups.<init>(Groups.java:86)
org.apache.hadoop.security.Groups.<init>(Groups.java:66)
org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:248)
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:763)
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:748)
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:621)
org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2753)
org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2745)
org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2611)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
org.apache.hadoop.fs.FileSystem.get(FileSystem.java:354)
org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
org.apache.parquet.hadoop.ParquetFileReader.readSummaryMetadata(ParquetFileReader.java:360)
org.apache.parquet.hadoop.ParquetFileReader$1.call(ParquetFileReader.java:158)
org.apache.parquet.hadoop.ParquetFileReader$1.call(ParquetFileReader.java:155)
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
java.base/java.lang.Thread.run(Thread.java:1583)
.
Does anyone know what is missing or what I can update to be able to write parquet files properly?