个人博客地址:MapReduce 读取 Hive ORC ArrayIndexOutOfBoundsException: 1024 异常解决 | 一张假钞的真实世界
在MR处理ORC的时候遇到如下异常:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1024at org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector (RunLengthIntegerReaderV2.java:369)at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays (TreeReaderFactory.java:1231)at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays (TreeReaderFactory.java:1268)at org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector (TreeReaderFactory.java:1368)at org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector (TreeReaderFactory.java:1212)at org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector (TreeReaderFactory.java:1902)at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch (TreeReaderFactory.java:1737)at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
通过搜索发现这个Bug在Hive 2.1.1版本中已经修复。我使用的就是这个版本,检查对应的源代码发现代码是已经按照下面的Patch修复过得:[HIVE-14483] java.lang.ArrayIndexOutOfBoundsException org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays - ASF JIRA
通过反编译发现我最终打包后的代码中使用的是未修复Bug的代码版本。通过依赖包发现依赖的以下模块中也包含ORC的Jar:
<dependency><groupId>org.apache.orc</groupId><artifactId>orc-mapreduce</artifactId><version>1.1.0</version>
</dependency>
解决方法是将orc-mapreduce
包升级到1.1.2版本,依赖配置如下:
<dependency><groupId>org.apache.orc</groupId><artifactId>orc-mapreduce</artifactId><version>1.1.2</version>
</dependency>