使用samtools来对sam/bam/cram相互转换

1.sam <=>bam

samtools view -h NA12878.bam >NA12878_2.sam  
samtools view -b -S NA12878.sam > NA12878_2.bam

2. cram=>bam

samtools view -bS artificial.sam >artificial.bam
samtools view -bS artificial.sam >artificial.bam
samtools view -bS artificial.cram >artificial2.bam

遇到问题:

hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/GIH/NA21144/alignment$ samtools view -b -S NA21144.alt_bwamem_GRCh38DH.20150718.GIH.low_coverage.cram >NA21144.alt_bwamem_GRCh38DH.20150718.GIH.low_coverage.bam
[E::cram_populate_ref] mismatching md5sum for downloaded reference.
Failed to populate reference for id 0
Unable to fetch reference #0 9998..119239
Failure to decode slice
[main_samview] truncated file.
hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/GIH/NA21144/alignment$


hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/GIH/NA21144/alignment$ ll
total 6991788
drwxr-xr-x 2 hadoop hadoop 4096 3月 9 22:38 ./
drwxr-xr-x 4 hadoop hadoop 4096 3月 8 21:39 ../
-rw-rw-r-- 1 hadoop hadoop 116162 3月 9 22:30 NA21144.alt_bwamem_GRCh38DH.20150718.GIH.low_coverage.bam
-rw-r--r-- 1 hadoop hadoop 877 3月 8 14:19 NA21144.alt_bwamem_GRCh38DH.20150718.GIH.low_coverage.bam.bas
-rw-r--r-- 1 hadoop hadoop 7158968591 3月 8 14:31 NA21144.alt_bwamem_GRCh38DH.20150718.GIH.low_coverage.cram
-rw-r--r-- 1 hadoop hadoop 482019 3月 9 22:38 NA21144.alt_bwamem_GRCh38DH.20150718.GIH.low_coverage.cram.crai


hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/GIH/NA21144/alignment$ samtools flagstat NA21144.alt_bwamem_GRCh38DH.20150718.GIH.low_coverage2.bam
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated.
248958 + 0 in total (QC-passed reads + QC-failed reads)
481 + 0 secondary
0 + 0 supplementary
4660 + 0 duplicates
247408 + 0 mapped (99.38% : N/A)
248477 + 0 paired in sequencing
124288 + 0 read1
124189 + 0 read2
227967 + 0 properly paired (91.75% : N/A)
245377 + 0 with itself and mate mapped
1550 + 0 singletons (0.62% : N/A)
3892 + 0 with mate mapped to a different chr
1109 + 0 with mate mapped to a different chr (mapQ>=5)
hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/GIH/NA21144/alignment$ samtools flagstat NA21144.alt_bwamem_GRCh38DH.20150718.GIH.low_coverage3.bam
97098407 + 0 in total (QC-passed reads + QC-failed reads)
179635 + 0 secondary
0 + 0 supplementary
2634031 + 0 duplicates
96638779 + 0 mapped (99.53% : N/A)
96918772 + 0 paired in sequencing
48457840 + 0 read1
48460932 + 0 read2
93116714 + 0 properly paired (96.08% : N/A)
95999516 + 0 with itself and mate mapped
459628 + 0 singletons (0.47% : N/A)
1495404 + 0 with mate mapped to a different chr
565190 + 0 with mate mapped to a different chr (mapQ>=5)





3.bam<=>cram:



samtools view -C -T ref.fa aln.bam > aln.cram


java -jar cramtools-3.0.jar bam -O yeast.bam -I yeast.cram -R yeast.fasta
java -jar cramtools-3.0.jar cram -O yeast2.cram  -I yeast.bam -R yeast.fasta



运行记录:


hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ java -jar cramtools-3.0.jar bam -O yeast.bam -I yeast.cram -R yeast.fasta 
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ ls
cramtools-3.0.jar yeast.bam yeast.cram yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ ll
total 19008
drwxrwxr-x 2 hadoop hadoop 4096 3月 10 15:23 ./
drwxrwxr-x 3 hadoop hadoop 4096 3月 10 15:01 ../
-rw-rw-r-- 1 hadoop hadoop 3986091 3月 10 15:01 cramtools-3.0.jar
-rw-rw-r-- 1 hadoop hadoop 2130246 3月 10 15:25 yeast.bam
-rw-rw-r-- 1 hadoop hadoop 967382 3月 10 15:01 yeast.cram
-rw-rw-r-- 1 hadoop hadoop 12360755 3月 10 15:23 yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ java -jar cramtools-3.0.jar bam -O yeast2.bam -I yeast.cram -R yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ ls
cramtools-3.0.jar yeast2.bam yeast.bam yeast.cram yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ ll
total 21092
drwxrwxr-x 2 hadoop hadoop 4096 3月 10 15:26 ./
drwxrwxr-x 3 hadoop hadoop 4096 3月 10 15:01 ../
-rw-rw-r-- 1 hadoop hadoop 3986091 3月 10 15:01 cramtools-3.0.jar
-rw-rw-r-- 1 hadoop hadoop 2130242 3月 10 15:26 yeast2.bam
-rw-rw-r-- 1 hadoop hadoop 2130246 3月 10 15:25 yeast.bam
-rw-rw-r-- 1 hadoop hadoop 967382 3月 10 15:01 yeast.cram
-rw-rw-r-- 1 hadoop hadoop 12360755 3月 10 15:23 yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ java -jar cramtools-3.0.jar bam -O yeast3.bam -I yeast.cram -R yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ ll
total 23176
drwxrwxr-x 2 hadoop hadoop 4096 3月 10 15:28 ./
drwxrwxr-x 3 hadoop hadoop 4096 3月 10 15:01 ../
-rw-rw-r-- 1 hadoop hadoop 3986091 3月 10 15:01 cramtools-3.0.jar
-rw-rw-r-- 1 hadoop hadoop 2130242 3月 10 15:26 yeast2.bam
-rw-rw-r-- 1 hadoop hadoop 2130242 3月 10 15:28 yeast3.bam
-rw-rw-r-- 1 hadoop hadoop 2130246 3月 10 15:25 yeast.bam
-rw-rw-r-- 1 hadoop hadoop 967382 3月 10 15:01 yeast.cram
-rw-rw-r-- 1 hadoop hadoop 12360755 3月 10 15:23 yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ java -jar cramtools-3.0.jar bam -O yeast4.bam -I yeast.cram -R yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ ll
total 25260
drwxrwxr-x 2 hadoop hadoop 4096 3月 10 15:30 ./
drwxrwxr-x 3 hadoop hadoop 4096 3月 10 15:01 ../
-rw-rw-r-- 1 hadoop hadoop 3986091 3月 10 15:01 cramtools-3.0.jar
-rw-rw-r-- 1 hadoop hadoop 2130242 3月 10 15:26 yeast2.bam
-rw-rw-r-- 1 hadoop hadoop 2130242 3月 10 15:28 yeast3.bam
-rw-rw-r-- 1 hadoop hadoop 2130242 3月 10 15:31 yeast4.bam
-rw-rw-r-- 1 hadoop hadoop 2130246 3月 10 15:25 yeast.bam
-rw-rw-r-- 1 hadoop hadoop 967382 3月 10 15:01 yeast.cram
-rw-rw-r-- 1 hadoop hadoop 12360755 3月 10 15:23 yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ java -jar cramtools-3.0.jar
Version 3.0-b48

Usage: cramtools [options] [command] [command options]
Options: -h, --help Print help and quit (default: false)
Commands:
bam CRAM to BAM conversion.
cram BAM to CRAM converter.
index BAM/CRAM indexer.
merge Tool to merge CRAM or BAM files.
fastq CRAM to FastQ dump conversion.
fixheader A tool to fix CRAM header without re-writing the whole file.
getref Download reference sequences.
qstat Quality score statistics.

hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ java -jar cramtools-3.0.jar cram
Version 3.0-b48

Usage: <main class> [options]

Options: --capture-all-tags Capture all tags. (default: false)
--capture-tags Capture the tags listed, for example 'OQ:XA:XB' (default: )
--encrypt Encrypt the CRAM file. (default: false)
--ignore-md5-mismatch Fail on MD5 mismatch if true, or correct (overwrite) the checksums and continue if false. (default: false)
--ignore-tags Ignore the tags listed, for example 'OQ:XA:XB' (default: )
--inject-sq-uri Inject or change the @SQ:UR header fields to point to ENA reference service. (default: false)
--input-bam-file, -I Path to a BAM file to be converted to CRAM. Omit if standard input (pipe).
--input-is-sam Input is in SAM format. (default: false)
--lossless-quality-score, -Q Preserve all quality scores. Overwrites '--lossless-quality-score'. (default: false)
--lossy-quality-score-spec, -L A string specifying what quality scores should be preserved. (default: )
--max-records Stop after compressing this many records. (default: 9223372036854775807)
--output-cram-file, -O The path for the output CRAM file. Omit if standard output (pipe).
--preserve-read-names, -n Preserve all read names. (default: false)
--reference-fasta-file, -R The reference fasta file, uncompressed and indexed (.fai file, use 'samtools faidx').
-h, --help Print help and quit (default: false)
-l, --log-level Change log level: DEBUG, INFO, WARNING, ERROR. (default: ERROR)


使用上述指令转换后明显有数据压缩:


hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ ll
total 26760
drwxrwxr-x 2 hadoop hadoop 4096 3月 10 16:49 ./
drwxrwxr-x 3 hadoop hadoop 4096 3月 10 15:01 ../
-rw-rw-r-- 1 hadoop hadoop 3986091 3月 10 15:01 cramtools-3.0.jar
-rw-rw-r-- 1 hadoop hadoop 2130242 3月 10 15:26 yeast2.bam
-rw-rw-r-- 1 hadoop hadoop 510298 3月 10 15:33 yeast2.cram
-rw-rw-r-- 1 hadoop hadoop 2130242 3月 10 15:28 yeast3.bam
-rw-rw-r-- 1 hadoop hadoop 510301 3月 10 15:40 yeast3.cram
-rw-rw-r-- 1 hadoop hadoop 2130242 3月 10 15:31 yeast4.bam
-rw-rw-r-- 1 hadoop hadoop 510301 3月 10 16:50 yeast5.cram
-rw-rw-r-- 1 hadoop hadoop 2130246 3月 10 15:25 yeast.bam
-rw-rw-r-- 1 hadoop hadoop 967382 3月 10 15:01 yeast.cram
-rw-rw-r-- 1 hadoop hadoop 12360755 3月 10 15:23 yeast.fasta




hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ samtools view yeast5.cram |head -20
SRR507778.19213 147 I 62 60 36M = 3183 3087 ATCCTAACACTACCCTAACACAGCCCTAATCTAACC * MD:Z:36 NM:i:0
SRR507778.12312 147 I 205 60 36M = 3626 3387 CCACTCACCCACCGTTACCCTCCAATTACCCATATC * MD:Z:36 NM:i:0
SRR507778.11604 83 I 402 60 36M = 3869 3503 CTCACTTGTATACTGATTTTACGTACGCACACGGAT * MD:Z:36 NM:i:0
SRR507778.10609 83 I 2661 40 36M = 6131 3506 TGAATTCGTACAACATTAAACGTGTGTTGGGAGTCG * MD:Z:36 NM:i:0
SRR507778.6249 147 I 2925 60 36M = 6404 3445 TTTCTAAGTGGGATTTTTCTTAATCCTTGGATTCTT * MD:Z:36 NM:i:0
SRR507778.14609 129 I 3048 60 36M IV 1525643 0 AAAAGTAGCCGTTCATTTCCCTTCCGATTTCATTCC * MD:Z:36 NM:i:0
SRR507778.20233 83 I 3132 60 36M = 6388 3292 TATTTGTGTCCCATTCTCGTAGATAAAATTCTTGGA * MD:Z:36 NM:i:0
SRR507778.19213 99 I 3183 60 36M = 62 -3087 ATTTTCTTCATAAAGAAGCTTTCAAGATATAAGATA * MD:Z:36 NM:i:0
SRR507778.20882 73 I 3259 60 36M = 3259 0 CAAAAAGGAAAGCATGGAGGGAAACAGTAAACAGTG * MD:Z:36 NM:i:0
SRR507778.20882 133 I 3259 0 * = 3259 0 GTGGTGTGTGTGGGTGAGGTGTGGGTGTGGGGAGGG *
SRR507778.12312 99 I 3626 60 36M = 205 -3387 GTATCTGATGTTTTTTTAGTAATTTCTTTGTAAATA * MD:Z:36 NM:i:0
SRR507778.11604 163 I 3869 60 36M = 402 -3503 TTTTTGAAAATATTCTGAGGTAAAAGCCATTAAGGT * MD:Z:36 NM:i:0
SRR507778.24515 83 I 4004 60 36M = 7814 3846 GATGTTTCAAGGCCTGAAGTTTGAATATTTATGTAG * MD:Z:36 NM:i:0
SRR507778.19471 83 I 4627 60 36M = 8153 3562 GGCAGAGTTTCCAAAAAAAATTGTTAATCGACAAAG * MD:Z:36 NM:i:0
SRR507778.15626 83 I 4748 60 36M = 8861 4149 TTTAAATTGTATTGAGTGCTTCAGTCATTGCAAAAT * MD:Z:36 NM:i:0
SRR507778.7265 147 I 4894 60 36M = 8228 3300 TATCTATCACAAAGGAGACAAAATCGTTGATAAAAA * MD:Z:36 NM:i:0
SRR507778.14364 83 I 5516 60 36M = 9133 3653 TATGATATAAAAACTCGGACCCTGTTTTACTTCTTT * MD:Z:36 NM:i:0
SRR507778.10609 163 I 6131 60 36M = 2661 -3506 CATACGTTGATTAGTACTGTTGGTCTCTCATTGAAA * MD:Z:36 NM:i:0
SRR507778.20233 163 I 6388 60 36M = 3132 -3292 ACCAATTTGACGTTAATTTTAAATGCGTTCTGAAGT * MD:Z:36 NM:i:0
SRR507778.6249 99 I 6404 60 36M = 2925 -3445 TTTTAAATGCGTTCTGAAGTTTCTTAAATAACCCGG * MD:Z:36 NM:i:0
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$ samtools view yeast.cram |head -20
SRR507778.19213 147 I 62 60 36M = 3183 3087 ATCCTAACACTACCCTAACACAGCCCTAATCTAACC 15=@9:@C3<CBGGGDGDBGDFCC?>GGG<GGGDGG AS:i:36 XS:i:19 MD:Z:36 NM:i:0
SRR507778.12312 147 I 205 60 36M = 3626 3387 CCACTCACCCACCGTTACCCTCCAATTACCCATATC GD<B?B>DGGBGGGGGCIIIIGIIIIIEIIGIIIII AS:i:36 XS:i:26 MD:Z:36 NM:i:0
SRR507778.11604 83 I 402 60 36M = 3869 3433 CTCACTTGTATACTGATTTTACGTACGCACACGGAT GHGHIIIIIIIIIIIHIIIIIIIIIIIIIIIIIIII AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.10609 83 I 2661 40 36M = 6131 3436 TGAATTCGTACAACATTAAACGTGTGTTGGGAGTCG IIIGFIIGIIHIIIIGIIIIIIIIIIHIIIIIIGII AS:i:36 XS:i:36 MD:Z:36 NM:i:0
SRR507778.6249 147 I 2925 60 36M = 6404 3445 TTTCTAAGTGGGATTTTTCTTAATCCTTGGATTCTT GGGGADIGIHIIHHHIEHIHIHI<DIIIIIIIGIIF AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.14609 129 I 3048 60 36M IV 1525643 0 AAAAGTAGCCGTTCATTTCCCTTCCGATTTCATTCC >5833+?=8>B@FBF?9B7AGGGB<G@BGGGGEGD> AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.20233 83 I 3132 60 36M = 6388 3222 TATTTGTGTCCCATTCTCGTAGATAAAATTCTTGGA IIIIIIIGGIIIIIIIIIIIIIIIIIIIIIIBIIII AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.19213 99 I 3183 60 36M = 62 -3087 ATTTTCTTCATAAAGAAGCTTTCAAGATATAAGATA HHHGHGDAHHHHHEHHHHHHHHGHEGBDGGGGG<GE AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.20882 73 I 3259 60 36M = 3259 0 CAAAAAGGAAAGCATGGAGGGAAACAGTAAACAGTG @GGGGGG>GGBD4DDGGEDGDDG@GAA1CBEEEE3D AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.20882 133 I 3259 0 * = 3259 0 GTGGTGTGTGTGGGTGAGGTGTGGGTGTGGGGAGGG EGBG8GCB8BBBB####################### AS:i:0 XS:i:0
SRR507778.12312 99 I 3626 60 36M = 205 -3387 GTATCTGATGTTTTTTTAGTAATTTCTTTGTAAATA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHI AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.11604 163 I 3869 60 36M = 402 -3433 TTTTTGAAAATATTCTGAGGTAAAAGCCATTAAGGT IIIIIIHIIIIIIIIIIIIIIIIIIIIHIIIIHIIE AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.24515 83 I 4004 60 36M = 7814 3776 GATGTTTCAAGGCCTGAAGTTTGAATATTTATGTAG IHIIIIIIIIIIIIHIIIIHIIIIIIIIIIIIIIII AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.19471 83 I 4627 60 36M = 8153 3492 GGCAGAGTTTCCAAAAAAAATTGTTAATCGACAAAG HGHBHGGHGHHHGHHHHHHHHGGGGGDDD=BDGGGG AS:i:36 XS:i:20 MD:Z:36 NM:i:0
SRR507778.15626 83 I 4748 60 36M = 8861 4079 TTTAAATTGTATTGAGTGCTTCAGTCATTGCAAAAT IIIIIIIIIIIIIIIIIIIIIIHIIIIIIIIIIIII AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.7265 147 I 4894 60 36M = 8228 3300 TATCTATCACAAAGGAGACAAAATCGTTGATAAAAA GGBDGGIIIFIGHHIIDDGGGGGDGDIIDIIEIIII AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.14364 83 I 5516 60 36M = 9133 3583 TATGATATAAAAACTCGGACCCTGTTTTACTTCTTT IIIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.10609 163 I 6131 60 36M = 2661 -3436 CATACGTTGATTAGTACTGTTGGTCTCTCATTGAAA HIHIHIGIIIHIIIHIHIIIGIIIGIEHDIIIHIHG AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.20233 163 I 6388 60 36M = 3132 -3222 ACCAATTTGACGTTAATTTTAAATGCGTTCTGAAGT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIGIIIHII AS:i:36 XS:i:0 MD:Z:36 NM:i:0
SRR507778.6249 99 I 6404 60 36M = 2925 -3445 TTTTAAATGCGTTCTGAAGTTTCTTAAATAACCCGG GGGGGGGGGGHHHHHHGDHHHHGHHFGHHHGFGGGG AS:i:36 XS:i:0 MD:Z:36 NM:i:0
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125/test1$

应该需要用压缩等级,但不确定??