对fold之后的脉冲星数据读取、处理–2

前期处理数据,fits类型的数据可以用astropy.io.fits来查看文件的简要信息以及后期数据的处理。

from astropy.io import fits
hdu = fits.open("xxxxxx.fits")
print(hdu.info())

折叠后数据包含了5个部分,如下所示:

Name

Type

PRIMARY

PrimaryHDU

HISTORY

BinTableHDU

PSRPARAM

BinTableHDU

POLYCO

BinTableHDU

SUBINT

BinTableHDU

其中PRIMARY为文件表头,包含了数据采样bits数、望远镜名字、接收机ID、观测时间等参数。仅这一个部分类型为PrimaryHDU。剩下的部分都为BinTableHDU
BinTableHDU包含了headercolumnsdata
HISTORY为记录数据处理过程。

PSRPARAM为脉冲星参数信息。

POLYCO用于每个数据阶段的polyco参数表。关于脉冲相位预测的参数和算法的描述,请参阅Tempo文档

SUBINT:数据部分。

可以用以下命令查看除了PRIMARY的部分的概要,比如最后一部分数据的:

from astropy.io import fits
import numpy as np
hdu = fits.open("xxxxxx.fits")
print(hdu[4].header)
print(hdu[4].columns)

输出如下:

XTENSION= ‘BINTABLE’ / ***** Subintegration data *****
 BITPIX = 8 / N/A
 NAXIS = 2 / 2-dimensional binary table
 NAXIS1 = 65628 / width of table in bytes
 NAXIS2 = 1415 / Number of rows in table (NSUBINT)
 PCOUNT = 0 / size of special data area
 GCOUNT = 1 / one data group (required keyword)
 TFIELDS = 11 / Number of fields per row
 TTYPE1 = ‘INDEXVAL’ / Optionally used if INT_TYPE != TIME
 TFORM1 = '1D ’ / Double
 TTYPE2 = 'TSUBINT ’ / Length of subintegration
 TFORM2 = '1D ’ / Double
 TTYPE3 = ‘OFFS_SUB’ / Offset from Start of subint centre
 TFORM3 = '1D ’ / Double
 TTYPE5 = 'AUX_DM ’ / additional DM (ionosphere, corona, etc.)
 TFORM5 = '1D ’ / Double
 TTYPE6 = 'AUX_RM ’ / additional RM (ionosphere, corona, etc.)
 TFORM6 = '1D ’ / Double
 TTYPE7 = ‘DAT_FREQ’ / [MHz] Centre frequency for each channel
 TFORM7 = 'D ’ / NCHAN doubles
 TTYPE8 = 'DAT_WTS ’ / Weights for each channel
 TFORM8 = 'E ’ / NCHAN floats
 TTYPE9 = ‘DAT_OFFS’ / Data offset for each channel
 TFORM9 = '4E ’ / NCHANNPOL floats
 TTYPE10 = 'DAT_SCL ’ / Data scale factor (outval=datavalscl + offs)
 TFORM10 = '4E ’ / NCHANNPOL floats
 TTYPE11 = 'DATA ’ / Subint data table
 TFORM11 = '32768I ’ / I (Fold) or B (1-8 bit) Search
 EPOCHS = 'VALID ’ / Epoch convention (VALID, MIDTIME, STT_MJD)
 INT_TYPE= 'TIME ’ / Time axis (TIME, BINPHSPERI, BINLNGASC, etc)
 INT_UNIT= 'SEC ’ / Unit of time axis (SEC, PHS (0-1), DEG)
 SCALE = 'Jansky ’ / Intensity units (FluxDen/RefFlux/Jansky)
 POL_TYPE= ‘AABBCRCI’ / Polarisation identifier (e.g., AABBCRCI, AA+BB)
 NPOL = 4 / Nr of polarisations
 TBIN = 4.9152E-05 / [s] Time per bin or sample
 NBIN = 8192 / Nr of bins (PSR/CAL mode; else 1)
 NBIN_PRD= ’ ’ / Nr of bins/pulse period (for gated data)
 PHS_OFFS= '* ’ / Phase offset of bin 0 for gated data
 NBITS = 8 / Nr of bits/datum (SEARCH mode data, else 1)
 ZERO_OFF= '* ’ / Zero offset for SEARCH-mode data
 SIGNINT = 0 / 1 for signed ints in SEARCH-mode data, else 0
 NSUBOFFS= '* ’ / Subint offset (Contiguous SEARCH-mode files)
 NCHAN = 1 / Number of channels/sub-bands in this file
 CHAN_BW = 500. / [MHz] Channel/sub-band width
 REFFREQ = 1400. / [MHz] Reference frequency
 DM = 279. / [cm-3 pc] DM for post-detection dedisperion
 RM = -12.76 / [rad m-2] RM for post-detection deFaraday
 NCHNOFFS= '* ’ / Channel/sub-band offset for split files
 NSBLK = 1024 / Samples/row (SEARCH mode, else 1)
 NSTOT = '* ’ / Total number of samples (SEARCH mode, else 1)
 EXTNAME = 'SUBINT ’ / name of this binary table extension
 TUNIT2 = 's ’ / Units of field
 TUNIT3 = 's ’ / Units of field
 TUNIT5 = 'CM-3 ’
 TUNIT6 = 'RAD ’
 TUNIT7 = 'MHz ’ / Units of field
 TDIM11 = ‘(8192,1,4)’ / (NBIN,NCHAN,NPOL) or (NCHAN,NPOL,NSBLK*NBITS/8)
 TUNIT11 = 'Jy ’ / Units of subint data
 EXTVER = 1 / auto assigned by template parser
 TTYPE4 = 'PERIOD ’ / label for field
 TFORM4 = '1D ’ / format of field
 TUNIT4 = 's ’ / Spin period in secondsColDefs(
 name = ‘INDEXVAL’; format = ‘1D’
 name = ‘TSUBINT’; format = ‘1D’; unit = ‘s’
 name = ‘OFFS_SUB’; format = ‘1D’; unit = ‘s’
 name = ‘PERIOD’; format = ‘1D’; unit = ‘s’
 name = ‘AUX_DM’; format = ‘1D’; unit = ‘CM-3’
 name = ‘AUX_RM’; format = ‘1D’; unit = ‘RAD’
 name = ‘DAT_FREQ’; format = ‘D’; unit = ‘MHz’
 name = ‘DAT_WTS’; format = ‘E’
 name = ‘DAT_OFFS’; format = ‘4E’
 name = ‘DAT_SCL’; format = ‘4E’
 name = ‘DATA’; format = ‘32768I’; unit = ‘Jy’; dim = ‘(8192,1,4)’
 )

其中DATA中包括了INDEXVALTSUBINTOFFS_SUBPERIODAUX_DMAUX_RMDAT_FREQDAT_WTSDAT_OFFSDAT_SCLDATA

可以通过以下命令查看某个数据,比如OFFS_SUB:

from astropy.io import fits
import numpy as np
hdu = fits.open("xxxxxx.fits")
print(hdu[4].data["OFFS_SUB"])

拼接数据

写了一个程序用作折叠数据的拼接。

import os
import sys
import math
import time
import pickle
import argparse
import numpy as np
from astropy.io import fits
import copy, random, struct
from astropy.io.fits import ColDefs


import matplotlib.pyplot as plt




def get_col_name(hdu):
    """
    get fits column name
    """
    ncol = len(hdu[4].columns)
    colname = []
    for i in range(ncol):
        colname.append(hdu[4].columns[i].name)
    return colname


def get_coldata(fitsname):
    """
    read fold pulsar data
    """
    hdu = fits.open(fitsname) ## read data
    
    colname = get_col_name(hdu)
    ncol = len(colname)
    coldata = {}
    for i in range(ncol):
        coldata[colname[i]]=hdu[4].data[colname[i]]
    return coldata
    
def header_set(header,name,Value):
    """change the Key's Value of fits header """
    header = header
    header[name] = Value
    return header

def columns_set(column,data1,name,attri,Value):
    """change the Key's Value of fits columns """
    columns = fits.ColDefs(column)
    columns.change_attrib(data=data1,col_name=name,attrib=attri,new_value=Value)
    return columns

### input parameter

parser = argparse.ArgumentParser(prog='Calculate',
                                usage='%(prog)s [options] calculate two numbers',
                                description='This is a calculat program')
parser.add_argument('-f','--filename',required=True,nargs='+',help='the input file name')

parser.add_argument('-o','--outfile',default='Addfile',help=' the output file name')
args = parser.parse_args()
#print(args)
#print(type(args))
filename = args.filename
outfile = args.outfile
if outfile =="Addfile":
    out_front = filename[0].split(".")[-1]
    outfile=outfile+"."+out_front
#print(outfile+"."+out_front)
if os.path.exists(r'%s'%outfile):
    os.remove(r'%s'%outfile)


#arg = sys.argv[1:]


#fits2 = "J1853_all.Fc"

#if os.path.exists(r'%s'%fits2):
#            os.remove(r'%s'%fits2)


hdu1 = fits.open(filename[0])


colname = get_col_name(hdu1)
coldata_a = {}
for i in range(len(colname)):
    coldata_a[colname[i]]=[]
    
#print(colname_dict)
#data=[]
#scales=[]
#offsets=[]
#weights=[]

nsub = 0

print ("WUUUUU",filename[:])

for i in range(len(filename)):
    #datai,nsubi,scalei,offseti,weighti = read_data(arg[i])
    coldata = get_coldata(filename[i])
    nsubi = len(coldata["DATA"])
    for j in range(len(colname)):
        coldata_a[colname[j]].extend(coldata[colname[j]])
    
    nsub +=nsubi
    #data.extend(datai)
    #scales.extend(scalei)
    
    #offsets.extend(offseti)
    #print("offset shape",offseti.shape)
    #weights.extend(weighti)
    
for i in range(len(colname)):
    coldata_a[colname[i]] = np.array(coldata_a[colname[i]])

header0 = hdu1[0].header
colheader = hdu1[4].header
### 设置第一个header card 的参数
primary_hdu=fits.PrimaryHDU(header=header0)
## 数据处理 星表 以及处理过程表
tab1 =  hdu1[1]
tab2 = hdu1[2]
tab3 = hdu1[3]
tab1_col = tab1.columns
tab1_header = tab1.header
tab1_col = fits.ColDefs(tab1_col)
tab1_col["NSUB"].array=np.array([nsub]*5,dtype='int32')
newtab1 = fits.BinTableHDU.from_columns(tab1_col,header=tab1_header)

## 设置 nsub的数量
colheader = header_set(colheader,"NAXIS2",nsub)

## 创建新的表 列
clumns_new = hdu1[4].columns
clumns_new = fits.ColDefs(clumns_new)
#DAT_OFFS = str(int(nsub)) +"E"
nbin = hdu1[4].header["NBIN"] ## the number samples per pulse
tbin = hdu1[4].header["TBIN"] ## [s] Time per bin or sample
total_time = round(nsub*nbin*tbin)
dt = hdu1[1].data['TBIN'][-1]
tsubint = nbin*dt
offs_sub_init = tsubint/2

print("tsubint",tbin,tsubint)
## 对表格的数据 重新赋值
for i in range(len(colname)):
    if colname[i] == 'OFFS_SUB':
        ## the first offs_sub
        offs_sub = []
        #print(off0)
        for j in np.arange(0,len(coldata_a[colname[i]])):
            #print(j*off0-off0)
            offs_sub.append(offs_sub_init+j*tsubint)
        
        clumns_new[colname[i]].array=np.array(offs_sub)
    else:
        clumns_new[colname[i]].array=coldata_a[colname[i]]
    
#print(colname[i])
#print(clumns_new[-1].array.shape)


### 组合表格 fits
newHdu = fits.BinTableHDU.from_columns(clumns_new,header=colheader)
outputdata=fits.HDUList([primary_hdu,newtab1,tab2,tab3,newHdu])



print("####wriite into %s #### "%outfile)
outputdata.writeto(outfile)
hdu1.close(output_verify="exception",verbose=True,closed=True)

程序用法

假设程序名字为psradd.py.

python parade.py -f file1.Fp file2.Fp file3.Fp -o file_all.Fp

-f file1 file2 输入文件的名字
-o filename 输出文件的名字

说明

在做数据拼时,POLYCO部分未做修改,可能导致某些错误,以后会对这部分修改。对与SUBINT部分,数据部分做了拼接,对于OFFS_SUB数据做了处理,OFFS_SUB为每一个脉冲中心距离数据开始记录时的时间间隔。处理方法为:获取采样间隔dt、每个脉冲序列的采样数目nbin、脉冲序号i。第一个init_offs_sub = nbin*dt/2,每个序列的offs_sub值为:i*dt*nbin+init_offs_sub.

折叠模式数据以16位有符号整数的形式存储,通道和偏振顺序与脉冲轮廓在连续的位置。在转换为整数之前,对每个子积分,从通道数据中减去平均channel数值(bins和polarisations的平均值),并对剩余进行缩放,以便data数组中的值覆盖整个可用范围(-32768至32767)。使用以下方法重构原始观测功率:
android studio脉冲趋势曲线 脉冲示踪法数据处理_数据

在使用pam -r phase命令时,偏移的相位为每个单独脉冲的,并不涉及上一个、下一个脉冲,如果一个完整的脉冲轮廓出现在一个脉冲序列之间,如下图所示:

android studio脉冲趋势曲线 脉冲示踪法数据处理_数据_02


如果一个脉冲出现在两个脉冲序列中,在单个脉冲序列如下图所示,会导致脉冲偏移, 在这种情况下,分析单脉冲能量时、比如nulling和mode shift,不能使用pam -r或者其他命令,在使用psrahive中的psradd命令也要慎重。

android studio脉冲趋势曲线 脉冲示踪法数据处理_python_03

2021年12月15日。或许之后psrchive会修复上述问题。