AlphaFold3 中templates模块的_build_query_to_hit_index_mapping函数是将原始查询序列(original_query_sequence
)中的索引与hit 序列(hit_sequence
)中的索引进行映射。
在蛋白质序列比对(如 HHsearch)中,hit 是与查询序列部分匹配的区域。由于存在缺口(-
)和部分比对,该函数修正索引,使得可以将原始序列的每个氨基酸位置正确映射到 hit 序列中。
该函数输入参数主要来自 HHsearch/HMMsearch 解析器 生成TemplateHit数据,返回的索引映射map可以用于提取模版特征(_extract_template_features函数参数)。
源代码:
def _build_query_to_hit_index_mapping(hit_query_sequence: str,hit_sequence: str,indices_hit: Sequence[int],indices_query: Sequence[int],original_query_sequence: str,
) -> Mapping[int, int]:"""Gets mapping from indices in original query sequence to indices in the hit.hit_query_sequence and hit_sequence are two aligned sequences containing gapcharacters. hit_query_sequence contains only the part of the original querysequence that matched the hit. When interpreting the indices from the .hhr, weneed to correct for this to recover a mapping from original query sequence tothe hit sequence.Args:hit_query_sequence: The portion of the query sequence that is in the .hhrhithit_sequence: The portion of the hit sequence that is in the .hhr