对fold之后的脉冲星数据读取、处理–2
前期处理数据,fits
类型的数据可以用astropy.io.fits
来查看文件的简要信息以及后期数据的处理。
from astropy.io import fits
hdu = fits.open("xxxxxx.fits")
print(hdu.info())
折叠后数据包含了5个部分,如下所示:
Name | Type |
PRIMARY | PrimaryHDU |
HISTORY | BinTableHDU |
PSRPARAM | BinTableHDU |
POLYCO | BinTableHDU |
SUBINT | BinTableHDU |
其中PRIMARY
为文件表头,包含了数据采样bits数、望远镜名字、接收机ID、观测时间等参数。仅这一个部分类型为PrimaryHDU
。剩下的部分都为BinTableHDU
。BinTableHDU
包含了header
、columns
和data
。HISTORY
为记录数据处理过程。
PSRPARAM
为脉冲星参数信息。
POLYCO
用于每个数据阶段的polyco参数表。关于脉冲相位预测的参数和算法的描述,请参阅Tempo文档 。
SUBINT
:数据部分。
可以用以下命令查看除了PRIMARY
的部分的概要,比如最后一部分数据的:
from astropy.io import fits
import numpy as np
hdu = fits.open("xxxxxx.fits")
print(hdu[4].header)
print(hdu[4].columns)
输出如下:
XTENSION= ‘BINTABLE’ / ***** Subintegration data *****
BITPIX = 8 / N/A
NAXIS = 2 / 2-dimensional binary table
NAXIS1 = 65628 / width of table in bytes
NAXIS2 = 1415 / Number of rows in table (NSUBINT)
PCOUNT = 0 / size of special data area
GCOUNT = 1 / one data group (required keyword)
TFIELDS = 11 / Number of fields per row
TTYPE1 = ‘INDEXVAL’ / Optionally used if INT_TYPE != TIME
TFORM1 = '1D ’ / Double
TTYPE2 = 'TSUBINT ’ / Length of subintegration
TFORM2 = '1D ’ / Double
TTYPE3 = ‘OFFS_SUB’ / Offset from Start of subint centre
TFORM3 = '1D ’ / Double
TTYPE5 = 'AUX_DM ’ / additional DM (ionosphere, corona, etc.)
TFORM5 = '1D ’ / Double
TTYPE6 = 'AUX_RM ’ / additional RM (ionosphere, corona, etc.)
TFORM6 = '1D ’ / Double
TTYPE7 = ‘DAT_FREQ’ / [MHz] Centre frequency for each channel
TFORM7 = 'D ’ / NCHAN doubles
TTYPE8 = 'DAT_WTS ’ / Weights for each channel
TFORM8 = 'E ’ / NCHAN floats
TTYPE9 = ‘DAT_OFFS’ / Data offset for each channel
TFORM9 = '4E ’ / NCHANNPOL floats
TTYPE10 = 'DAT_SCL ’ / Data scale factor (outval=datavalscl + offs)
TFORM10 = '4E ’ / NCHANNPOL floats
TTYPE11 = 'DATA ’ / Subint data table
TFORM11 = '32768I ’ / I (Fold) or B (1-8 bit) Search
EPOCHS = 'VALID ’ / Epoch convention (VALID, MIDTIME, STT_MJD)
INT_TYPE= 'TIME ’ / Time axis (TIME, BINPHSPERI, BINLNGASC, etc)
INT_UNIT= 'SEC ’ / Unit of time axis (SEC, PHS (0-1), DEG)
SCALE = 'Jansky ’ / Intensity units (FluxDen/RefFlux/Jansky)
POL_TYPE= ‘AABBCRCI’ / Polarisation identifier (e.g., AABBCRCI, AA+BB)
NPOL = 4 / Nr of polarisations
TBIN = 4.9152E-05 / [s] Time per bin or sample
NBIN = 8192 / Nr of bins (PSR/CAL mode; else 1)
NBIN_PRD= ’ ’ / Nr of bins/pulse period (for gated data)
PHS_OFFS= '* ’ / Phase offset of bin 0 for gated data
NBITS = 8 / Nr of bits/datum (SEARCH mode data, else 1)
ZERO_OFF= '* ’ / Zero offset for SEARCH-mode data
SIGNINT = 0 / 1 for signed ints in SEARCH-mode data, else 0
NSUBOFFS= '* ’ / Subint offset (Contiguous SEARCH-mode files)
NCHAN = 1 / Number of channels/sub-bands in this file
CHAN_BW = 500. / [MHz] Channel/sub-band width
REFFREQ = 1400. / [MHz] Reference frequency
DM = 279. / [cm-3 pc] DM for post-detection dedisperion
RM = -12.76 / [rad m-2] RM for post-detection deFaraday
NCHNOFFS= '* ’ / Channel/sub-band offset for split files
NSBLK = 1024 / Samples/row (SEARCH mode, else 1)
NSTOT = '* ’ / Total number of samples (SEARCH mode, else 1)
EXTNAME = 'SUBINT ’ / name of this binary table extension
TUNIT2 = 's ’ / Units of field
TUNIT3 = 's ’ / Units of field
TUNIT5 = 'CM-3 ’
TUNIT6 = 'RAD ’
TUNIT7 = 'MHz ’ / Units of field
TDIM11 = ‘(8192,1,4)’ / (NBIN,NCHAN,NPOL) or (NCHAN,NPOL,NSBLK*NBITS/8)
TUNIT11 = 'Jy ’ / Units of subint data
EXTVER = 1 / auto assigned by template parser
TTYPE4 = 'PERIOD ’ / label for field
TFORM4 = '1D ’ / format of field
TUNIT4 = 's ’ / Spin period in secondsColDefs(
name = ‘INDEXVAL’; format = ‘1D’
name = ‘TSUBINT’; format = ‘1D’; unit = ‘s’
name = ‘OFFS_SUB’; format = ‘1D’; unit = ‘s’
name = ‘PERIOD’; format = ‘1D’; unit = ‘s’
name = ‘AUX_DM’; format = ‘1D’; unit = ‘CM-3’
name = ‘AUX_RM’; format = ‘1D’; unit = ‘RAD’
name = ‘DAT_FREQ’; format = ‘D’; unit = ‘MHz’
name = ‘DAT_WTS’; format = ‘E’
name = ‘DAT_OFFS’; format = ‘4E’
name = ‘DAT_SCL’; format = ‘4E’
name = ‘DATA’; format = ‘32768I’; unit = ‘Jy’; dim = ‘(8192,1,4)’
)
其中DATA中包括了INDEXVAL
、TSUBINT
、OFFS_SUB
、PERIOD
、AUX_DM
、AUX_RM
、DAT_FREQ
、DAT_WTS
、DAT_OFFS
、DAT_SCL
、DATA
。
可以通过以下命令查看某个数据,比如OFFS_SUB
:
from astropy.io import fits
import numpy as np
hdu = fits.open("xxxxxx.fits")
print(hdu[4].data["OFFS_SUB"])
拼接数据
写了一个程序用作折叠数据的拼接。
import os
import sys
import math
import time
import pickle
import argparse
import numpy as np
from astropy.io import fits
import copy, random, struct
from astropy.io.fits import ColDefs
import matplotlib.pyplot as plt
def get_col_name(hdu):
"""
get fits column name
"""
ncol = len(hdu[4].columns)
colname = []
for i in range(ncol):
colname.append(hdu[4].columns[i].name)
return colname
def get_coldata(fitsname):
"""
read fold pulsar data
"""
hdu = fits.open(fitsname) ## read data
colname = get_col_name(hdu)
ncol = len(colname)
coldata = {}
for i in range(ncol):
coldata[colname[i]]=hdu[4].data[colname[i]]
return coldata
def header_set(header,name,Value):
"""change the Key's Value of fits header """
header = header
header[name] = Value
return header
def columns_set(column,data1,name,attri,Value):
"""change the Key's Value of fits columns """
columns = fits.ColDefs(column)
columns.change_attrib(data=data1,col_name=name,attrib=attri,new_value=Value)
return columns
### input parameter
parser = argparse.ArgumentParser(prog='Calculate',
usage='%(prog)s [options] calculate two numbers',
description='This is a calculat program')
parser.add_argument('-f','--filename',required=True,nargs='+',help='the input file name')
parser.add_argument('-o','--outfile',default='Addfile',help=' the output file name')
args = parser.parse_args()
#print(args)
#print(type(args))
filename = args.filename
outfile = args.outfile
if outfile =="Addfile":
out_front = filename[0].split(".")[-1]
outfile=outfile+"."+out_front
#print(outfile+"."+out_front)
if os.path.exists(r'%s'%outfile):
os.remove(r'%s'%outfile)
#arg = sys.argv[1:]
#fits2 = "J1853_all.Fc"
#if os.path.exists(r'%s'%fits2):
# os.remove(r'%s'%fits2)
hdu1 = fits.open(filename[0])
colname = get_col_name(hdu1)
coldata_a = {}
for i in range(len(colname)):
coldata_a[colname[i]]=[]
#print(colname_dict)
#data=[]
#scales=[]
#offsets=[]
#weights=[]
nsub = 0
print ("WUUUUU",filename[:])
for i in range(len(filename)):
#datai,nsubi,scalei,offseti,weighti = read_data(arg[i])
coldata = get_coldata(filename[i])
nsubi = len(coldata["DATA"])
for j in range(len(colname)):
coldata_a[colname[j]].extend(coldata[colname[j]])
nsub +=nsubi
#data.extend(datai)
#scales.extend(scalei)
#offsets.extend(offseti)
#print("offset shape",offseti.shape)
#weights.extend(weighti)
for i in range(len(colname)):
coldata_a[colname[i]] = np.array(coldata_a[colname[i]])
header0 = hdu1[0].header
colheader = hdu1[4].header
### 设置第一个header card 的参数
primary_hdu=fits.PrimaryHDU(header=header0)
## 数据处理 星表 以及处理过程表
tab1 = hdu1[1]
tab2 = hdu1[2]
tab3 = hdu1[3]
tab1_col = tab1.columns
tab1_header = tab1.header
tab1_col = fits.ColDefs(tab1_col)
tab1_col["NSUB"].array=np.array([nsub]*5,dtype='int32')
newtab1 = fits.BinTableHDU.from_columns(tab1_col,header=tab1_header)
## 设置 nsub的数量
colheader = header_set(colheader,"NAXIS2",nsub)
## 创建新的表 列
clumns_new = hdu1[4].columns
clumns_new = fits.ColDefs(clumns_new)
#DAT_OFFS = str(int(nsub)) +"E"
nbin = hdu1[4].header["NBIN"] ## the number samples per pulse
tbin = hdu1[4].header["TBIN"] ## [s] Time per bin or sample
total_time = round(nsub*nbin*tbin)
dt = hdu1[1].data['TBIN'][-1]
tsubint = nbin*dt
offs_sub_init = tsubint/2
print("tsubint",tbin,tsubint)
## 对表格的数据 重新赋值
for i in range(len(colname)):
if colname[i] == 'OFFS_SUB':
## the first offs_sub
offs_sub = []
#print(off0)
for j in np.arange(0,len(coldata_a[colname[i]])):
#print(j*off0-off0)
offs_sub.append(offs_sub_init+j*tsubint)
clumns_new[colname[i]].array=np.array(offs_sub)
else:
clumns_new[colname[i]].array=coldata_a[colname[i]]
#print(colname[i])
#print(clumns_new[-1].array.shape)
### 组合表格 fits
newHdu = fits.BinTableHDU.from_columns(clumns_new,header=colheader)
outputdata=fits.HDUList([primary_hdu,newtab1,tab2,tab3,newHdu])
print("####wriite into %s #### "%outfile)
outputdata.writeto(outfile)
hdu1.close(output_verify="exception",verbose=True,closed=True)
程序用法
假设程序名字为psradd.py
.
python parade.py -f file1.Fp file2.Fp file3.Fp -o file_all.Fp
-f file1 file2 输入文件的名字
-o filename 输出文件的名字
说明
在做数据拼时,POLYCO
部分未做修改,可能导致某些错误,以后会对这部分修改。对与SUBINT
部分,数据部分做了拼接,对于OFFS_SUB
数据做了处理,OFFS_SUB
为每一个脉冲中心距离数据开始记录时的时间间隔。处理方法为:获取采样间隔dt
、每个脉冲序列的采样数目nbin
、脉冲序号i
。第一个init_offs_sub = nbin*dt/2,每个序列的offs_sub
值为:i*dt*nbin+init_offs_sub
.
折叠模式数据以16位有符号整数的形式存储,通道和偏振顺序与脉冲轮廓在连续的位置。在转换为整数之前,对每个子积分,从通道数据中减去平均channel数值(bins和polarisations的平均值),并对剩余进行缩放,以便data数组中的值覆盖整个可用范围(-32768至32767)。使用以下方法重构原始观测功率:
在使用pam -r phase
命令时,偏移的相位为每个单独脉冲的,并不涉及上一个、下一个脉冲,如果一个完整的脉冲轮廓出现在一个脉冲序列之间,如下图所示:
如果一个脉冲出现在两个脉冲序列中,在单个脉冲序列如下图所示,会导致脉冲偏移, 在这种情况下,分析单脉冲能量时、比如nulling和mode shift,不能使用pam -r
或者其他命令,在使用psrahive中的psradd
命令也要慎重。
2021年12月15日。或许之后psrchive
会修复上述问题。