SAM格式:



1.FLAG说明:

Each bit in the FLAG field is defined as:
0x0001 p the read is paired in sequencing
0x0002 P the read is mapped in a proper pair
0x0004 u the query sequence itself is unmapped
0x0008 U the mate is unmapped
0x0010 r strand of the query (1 for reverse)
0x0020 R strand of the mate
0x0040 1 the read is the first read in a pair
0x0080 2 the read is the second read in a pair
0x0100 s the alignment is not primary
0x0200 f the read fails platform/vendor quality checks
0x0400 d the read is either a PCR or an optical duplicate
0x0800 S the alignment is supplementary

where the second column gives the string representation of the FLAG field.


2.理解:0x为16进制位,每一个代表一个特定的意思



3.实例:

read1:

@chrUn_KN707963v1_decoy_19393_19870_2:0:0_0:0:0_0/1
CCATTTGATTCCATTCCTTTGGATTCCATTCCATTGTATTGGATTGCATTGGATTCCATTCCATTCTATT
+
2222222222222222222222222222222222222222222222222222222222222222222222


read2:

@chrUn_KN707963v1_decoy_19393_19870_2:0:0_0:0:0_0/2
TCAAAGGGAATAGAATCGAATGAAATAGAATCTAATGGAATGGAATGGAATGGAATGGAATGGAATGGAA
+
2222222222222222222222222222222222222222222222222222222222222222222222


匹配的SAM(部分):

@SQ SN:HLA-DRB1*15:03:01:01 LN:11567
@SQ SN:HLA-DRB1*15:03:01:02 LN:11569
@SQ SN:HLA-DRB1*16:02:01 LN:11005
@PG ID:bwa PN:bwa VN:0.7.13-r1126 CL:bwa sampe ../hs38DH.fa hs38DHPE1L100F1.sai hs38DHPE1L100F2.sai hs38DHPE1L100F1.fq hs38DHPE1L100F2.fq
chrUn_KN707963v1_decoy_19393_19870_2:0:0_0:0:0_0 99 chrUn_KN707963v1_decoy 19393 60 70M = 19801 478 CCATTTGATTCCATTCCTTTGGATTCCATTCCATTGTATTGGATTGCATTGGATTCCATTCCATTCTATT 2222222222222222222222222222222222222222222222222222222222222222222222 XT:A:U NM:i:2 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:2 XO:i:0 XG:i:0 MD:Z:40C4C24
chrUn_KN707963v1_decoy_19393_19870_2:0:0_0:0:0_0 147 chrUn_KN707963v1_decoy 19801 60 70M = 19393 -478 TTCCATTCCATTCCATTCCATTCCATTCCATTCCATTAGATTCTATTTCATTCGATTCTATTCCCTTTGA 2222222222222222222222222222222222222222222222222222222222222222222222 XT:A:U NM:i:0 SM:i:37 AM:i:37 X0:i:1 X1:i:0 XM:i:0 XO:i:0 XG:i:0 MD:Z:70


其中99表示read1的FLAG,99=64+32+2+1

64表示the read is the first read in a pair

32表示strand of the mate

2表示the read is mapped in a proper pair

1表示<span style="font-size: 12px; font-family: Arial, Helvetica, sans-serif;">the read is paired in sequencing</span>
</pre><p></p><p>其中147表示read2的FLAG,147=128+16+2+1</p><p>128表示:<span style="font-size: 12px; background-color: rgb(240, 240, 240);">the read is the second read in a pair</span></p><p>16表示:<span style="font-size: 12px; background-color: rgb(240, 240, 240);">strand of the query (1 for reverse)</span><pre name="code" class="plain">表示查询序列是反的,
原来产生的序列为:
<pre name="code" class="plain">TCAAAGGGAATAGAATCGAATGAAATAGAATCTAATGGAATGGAATGGAATGGAATGGAATGGAATGGAA

匹配后的序列为:


<pre name="code" class="plain">TTCCATTCCATTCCATTCCATTCCATTCCATTCCATTAGATTCTATTTCATTCGATTCTATTCCCTTTGA

可以看出,两条序列是反向匹配的,TCAAAGGG匹配第二条后面开始的AGTTTCCC。。。




2和1同read1


参考:

【1】​​The sequence alignment/map format and SAMtools​