欢迎来到尧图网

客户服务 关于我们

您的位置:首页 > 房产 > 建筑 > Spark 之 HiveStrategies

Spark 之 HiveStrategies

2024/10/24 17:25:04 来源:https://blog.csdn.net/zhixingheyi_tian/article/details/139441801  浏览:    关键词:Spark 之 HiveStrategies
HiveTableRelation 相关代码

HiveStrategies.scala
当 relation.tableMeta.stats.isEmpty 是, 即调用 hiveTableWithStats

class DetermineTableStats(session: SparkSession) extends Rule[LogicalPlan] {private def hiveTableWithStats(relation: HiveTableRelation): HiveTableRelation = {val table = relation.tableMetaval partitionCols = relation.partitionCols// For partitioned tables, the partition directory may be outside of the table directory.// Which is expensive to get table size. Please see how we implemented it in the AnalyzeTable.val sizeInBytes = if (conf.fallBackToHdfsForStatsEnabled && partitionCols.isEmpty) {try {val hadoopConf = session.sessionState.newHadoopConf()val tablePath = new Path(table.location)val fs: FileSystem = tablePath.getFileSystem(hadoopConf)fs.getContentSummary(tablePath).getLength} catch {case e: IOException =>logWarning("Failed to get table size from HDFS.", e)conf.defaultSizeInBytes}} else {conf.defaultSizeInBytes}val stats = Some(Statistics(sizeInBytes = BigInt(sizeInBytes)))relation.copy(tableStats = stats)}override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {case relation: HiveTableRelationif DDLUtils.isHiveTable(relation.tableMeta) && relation.tableMeta.stats.isEmpty =>hiveTableWithStats(relation)// handles InsertIntoStatement specially as the table in InsertIntoStatement is not added in its// children, hence not matched directly by previous HiveTableRelation case.case i @ InsertIntoStatement(relation: HiveTableRelation, _, _, _, _, _)if DDLUtils.isHiveTable(relation.tableMeta) && relation.tableMeta.stats.isEmpty =>i.copy(table = hiveTableWithStats(relation))}
}
  • HiveTableRelation
/*** A `LogicalPlan` that represents a hive table.** TODO: remove this after we completely make hive as a data source.*/
case class HiveTableRelation(tableMeta: CatalogTable,dataCols: Seq[AttributeReference],partitionCols: Seq[AttributeReference],tableStats: Option[Statistics] = None,@transient prunedPartitions: Option[Seq[CatalogTablePartition]] = None)

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com