SAM数据格式学习2之FLAG理解
原创
©著作权归作者所有:来自51CTO博客作者KeepLearningAI的原创作品,请联系作者获取转载授权,否则将追究法律责任
SAM格式:
1.FLAG说明:
Each bit in the FLAG field is defined as:
0x0001 p the read is paired in sequencing
0x0002 P the read is mapped in a proper pair
0x0004 u the query sequence itself is unmapped
0x0008 U the mate is unmapped
0x0010 r strand of the query (1 for reverse)
0x0020 R strand of the mate
0x0040 1 the read is the first read in a pair
0x0080 2 the read is the second read in a pair
0x0100 s the alignment is not primary
0x0200 f the read fails platform/vendor quality checks
0x0400 d the read is either a PCR or an optical duplicate
0x0800 S the alignment is supplementary
where the second column gives the string representation of the FLAG field.
2.理解:0x为16进制位,每一个代表一个特定的意思
3.实例:
read1:
@chrUn_KN707963v1_decoy_19393_19870_2:0:0_0:0:0_0/1
CCATTTGATTCCATTCCTTTGGATTCCATTCCATTGTATTGGATTGCATTGGATTCCATTCCATTCTATT
+
2222222222222222222222222222222222222222222222222222222222222222222222
read2:
@chrUn_KN707963v1_decoy_19393_19870_2:0:0_0:0:0_0/2
TCAAAGGGAATAGAATCGAATGAAATAGAATCTAATGGAATGGAATGGAATGGAATGGAATGGAATGGAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
匹配的SAM(部分):
@SQ SN:HLA-DRB1*15:03:01:01 LN:11567
@SQ SN:HLA-DRB1*15:03:01:02 LN:11569
@SQ SN:HLA-DRB1*16:02:01 LN:11005
@PG ID:bwa PN:bwa VN:0.7.13-r1126 CL:bwa sampe ../hs38DH.fa hs38DHPE1L100F1.sai hs38DHPE1L100F2.sai hs38DHPE1L100F1.fq hs38DHPE1L100F2.fq
chrUn_KN707963v1_decoy_19393_19870_2:0:0_0:0:0_0 99 chrUn_KN707963v1_decoy 19393 60 70M = 19801 478 CCATTTGATTCCATTCCTTTGGATTCCATTCCATTGTATTGGATTGCATTGGATTCCATTCCATTCTATT 2222222222222222222222222222222222222222222222222222222222222222222222 XT:A:U NM:i:2 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:40C4C24
chrUn_KN707963v1_decoy_19393_19870_2:0:0_0:0:0_0 147 chrUn_KN707963v1_decoy 19801 60 70M = 19393 -478 TTCCATTCCATTCCATTCCATTCCATTCCATTCCATTAGATTCTATTTCATTCGATTCTATTCCCTTTGA 2222222222222222222222222222222222222222222222222222222222222222222222 XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:70
其中99表示read1的FLAG,99=64+32+2+1
64表示the read is the first read in a pair
32表示strand of the mate
2表示the read is mapped in a proper pair
1表示<span style="font-size: 12px; font-family: Arial, Helvetica, sans-serif;">the read is paired in sequencing</span>
</pre><p></p><p>其中147表示read2的FLAG,147=128+16+2+1</p><p>128表示:<span style="font-size: 12px; background-color: rgb(240, 240, 240);">the read is the second read in a pair</span></p><p>16表示:<span style="font-size: 12px; background-color: rgb(240, 240, 240);">strand of the query (1 for reverse)</span><pre name="code" class="plain">表示查询序列是反的,
<pre name="code" class="plain">TCAAAGGGAATAGAATCGAATGAAATAGAATCTAATGGAATGGAATGGAATGGAATGGAATGGAATGGAA
匹配后的序列为:
<pre name="code" class="plain">TTCCATTCCATTCCATTCCATTCCATTCCATTCCATTAGATTCTATTTCATTCGATTCTATTCCCTTTGA
可以看出,两条序列是反向匹配的,TCAAAGGG匹配第二条后面开始的AGTTTCCC。。。
2和1同read1
参考:
【1】The sequence alignment/map format and SAMtools