十分钟搞定pandas（二）

转载

4562xse3460 2023-10-25 15:48:11 博主文章分类：Python

文章标签 python pandas Python 添加行 SQL 文章分类 JavaScript 前端开发

合并

连接

pandas提供各种工具以简便合并序列,数据桢,和组合对象, 在连接/合并类型操作中使用多种类型索引和相关数学函数.

请参阅合并部分

把pandas对象连接到一起

Python

In                 [        73        ]        :                 df                 =                 pd        .        DataFrame        (        np        .        random        .        randn        (        10        ,                 4        )        )

In                 [        74        ]        :                 df

Out        [        74        ]        :         

                  0                         1                         2                         3

0                 -        0.548702                  1.467327                 -        1.015962                 -        0.483075

1                  1.637550                 -        1.217659                 -        0.291519                 -        1.745505

2                 -        0.263952                  0.991460                 -        0.919069                  0.266046

3                 -        0.709661                  1.669052                  1.037882                 -        1.705775

4                 -        0.919854                 -        0.042379                  1.247642                 -        0.009920

5                  0.290213                  0.495767                  0.362949                  1.548106

6                 -        1.131345                 -        0.089329                  0.337863                 -        0.945867

7                 -        0.932132                  1.956030                  0.017587                 -        0.016692

8                 -        0.575247                  0.254161                 -        1.143704                  0.215897

9                  1.193555                 -        0.077118                 -        0.408530                 -        0.862495

                

# break it into pieces

In                 [        75        ]        :                 pieces                 =                 [        df        [        :        3        ]        ,                 df        [        3        :        7        ]        ,                 df        [        7        :        ]        ]

In                 [        76        ]        :                 pd        .        concat        (        pieces        )

Out        [        76        ]        :         

                  0                         1                         2                         3

0                 -        0.548702                  1.467327                 -        1.015962                 -        0.483075

1                  1.637550                 -        1.217659                 -        0.291519                 -        1.745505

2                 -        0.263952                  0.991460                 -        0.919069                  0.266046

3                 -        0.709661                  1.669052                  1.037882                 -        1.705775

4                 -        0.919854                 -        0.042379                  1.247642                 -        0.009920

5                  0.290213                  0.495767                  0.362949                  1.548106

6                 -        1.131345                 -        0.089329                  0.337863                 -        0.945867

7                 -        0.932132                  1.956030                  0.017587                 -        0.016692

8                 -        0.575247                  0.254161                 -        1.143704                  0.215897

9                  1.193555                 -        0.077118                 -        0.408530                 -        0.862495

连接

SQL样式合并. 请参阅数据库style联接

Python

In                [        77        ]        :                left                =                pd        .        DataFrame        (        {        'key'        :                [        'foo'        ,                'foo'        ]        ,                'lval'        :                [        1        ,                2        ]        }        )


In                [        78        ]        :                right                =                pd        .        DataFrame        (        {        'key'        :                [        'foo'        ,                'foo'        ]        ,                'rval'        :                [        4        ,                5        ]        }        )


In                [        79        ]        :                left


Out        [        79        ]        :        


          key          lval


0                  foo                    1


1                  foo                    2


                 


In                [        80        ]        :                right


Out        [        80        ]        :        


          key          rval


0                  foo                    4


1                  foo                    5


                 


In                [        81        ]        :                pd        .        merge        (        left        ,                right        ,                on        =        'key'        )


Out        [        81        ]        :        


          key          lval          rval


0                  foo                    1                    4


1                  foo                    1                    5


2                  foo                    2                    4


3                  foo                    2                    5

添加

添加行到数据增. 参阅添加

Python

In                 [        82        ]        :                 df                 =                 pd        .        DataFrame        (        np        .        random        .        randn        (        8        ,                 4        )        ,                 columns        =        [        'A'        ,        'B'        ,        'C'        ,        'D'        ]        )

In                 [        83        ]        :                 df

Out        [        83        ]        :         

                  A                         B                         C                         D

0                  1.346061                  1.511763                  1.627081                 -        0.990582

1                 -        0.441652                  1.211526                  0.268520                  0.024580

2                 -        1.577585                  0.396823                 -        0.105381                 -        0.532532

3                  1.453749                  1.208843                 -        0.080952                 -        0.264610

4                 -        0.727965                 -        0.589346                  0.339969                 -        0.693205

5                 -        0.339355                  0.593616                  0.884345                  1.591431

6                  0.141809                  0.220390                  0.435589                  0.192451

7                 -        0.096701                  0.803351                  1.715071                 -        0.708758

                

In                 [        84        ]        :                 s                 =                 df        .        iloc        [        3        ]

In                 [        85        ]        :                 df        .        append        (        s        ,                 ignore_index        =        True        )

Out        [        85        ]        :         

                  A                         B                         C                         D

0                  1.346061                  1.511763                  1.627081                 -        0.990582

1                 -        0.441652                  1.211526                  0.268520                  0.024580

2                 -        1.577585                  0.396823                 -        0.105381                 -        0.532532

3                  1.453749                  1.208843                 -        0.080952                 -        0.264610

4                 -        0.727965                 -        0.589346                  0.339969                 -        0.693205

5                 -        0.339355                  0.593616                  0.884345                  1.591431

6                  0.141809                  0.220390                  0.435589                  0.192451

7                 -        0.096701                  0.803351                  1.715071                 -        0.708758

8                  1.453749                  1.208843                 -        0.080952                 -        0.264610

分组

对于“group by”指的是以下一个或多个处理

将数据按某些标准分割为不同的组
在每个独立组上应用函数
组合结果为一个数据结构

请参阅分组部分

Python

In                [        86        ]        :                df                =                pd        .        DataFrame        (        {        'A'                :                [        'foo'        ,                'bar'        ,                'foo'        ,                'bar'        ,


          .        .        .        .        :                                          'foo'        ,                'bar'        ,                'foo'        ,                'foo'        ]        ,


          .        .        .        .        :                                    'B'                :                [        'one'        ,                'one'        ,                'two'        ,                'three'        ,


          .        .        .        .        :                                          'two'        ,                'two'        ,                'one'        ,                'three'        ]        ,


          .        .        .        .        :                                    'C'                :                np        .        random        .        randn        (        8        )        ,


          .        .        .        .        :                                    'D'                :                np        .        random        .        randn        (        8        )        }        )


          .        .        .        .        :        


In                [        87        ]        :                df


Out        [        87        ]        :        


            A                      B                        C                        D


0                  foo            one                -        1.202872                -        0.055224


1                  bar            one                -        1.814470                  2.395985


2                  foo            two                  1.018601                  1.552825


3                  bar          three                -        0.595447                  0.166599


4                  foo            two                  1.395433                  0.047609


5                  bar            two                -        0.392670                -        0.136473


6                  foo            one                  0.007207                -        0.561757


7                  foo          three                  1.928123                -        1.623033

分组然后应用函数统计总和存放到结果组

Python

In                 [        88        ]        :                 df        .        groupby        (        'A'        )        .        sum        (        )

Out        [        88        ]        :         

                    C                        D

A                             

bar                 -        2.802588                  2.42611

foo                  3.146492                 -        0.63958

按多列分组为层次索引,然后应用函数

Python

In                [        89        ]        :                df        .        groupby        (        [        'A'        ,        'B'        ]        )        .        sum        (        )


Out        [        89        ]        :        


                          C                        D


A                  B                                


bar        one                  -        1.814470                  2.395985


            three                -        0.595447                  0.166599


            two                  -        0.392670                -        0.136473


foo        one                  -        1.195665                -        0.616981


            three                  1.928123                -        1.623033


            two                    2.414034                  1.600434

重塑

请参阅章节分层索引和重塑.

堆叠

Python

In                 [        90        ]        :                 tuples                 =                 list        (        zip        (        *        [        [        'bar'        ,                 'bar'        ,                 'baz'        ,                 'baz'        ,

           .        .        .        .        :                                      'foo'        ,                 'foo'        ,                 'qux'        ,                 'qux'        ]        ,

           .        .        .        .        :                                     [        'one'        ,                 'two'        ,                 'one'        ,                 'two'        ,

           .        .        .        .        :                                      'one'        ,                 'two'        ,                 'one'        ,                 'two'        ]        ]        )        )

           .        .        .        .        :         

In                 [        91        ]        :                 index                 =                 pd        .        MultiIndex        .        from_tuples        (        tuples        ,                 names        =        [        'first'        ,                 'second'        ]        )

In                 [        92        ]        :                 df                 =                 pd        .        DataFrame        (        np        .        random        .        randn        (        8        ,                 2        )        ,                 index        =        index        ,                 columns        =        [        'A'        ,                 'B'        ]        )

In                 [        93        ]        :                 df2                 =                 df        [        :        4        ]

In                 [        94        ]        :                 df2

Out        [        94        ]        :         

                             A                         B

first         second                    

bar           one                     0.029399                 -        0.542108

              two                     0.282696                 -        0.087302

baz           one                    -        1.575170                  1.771208

              two                     0.816482                  1.100230

堆叠

Python

In                [        95        ]        :                stacked                =                df2        .        stack        (        )


In                [        96        ]        :                stacked


Out        [        96        ]        :        


first          second  


bar            one                    A                    0.029399


                      B                  -        0.542108


              two                    A                    0.282696


                      B                  -        0.087302


baz            one                    A                  -        1.575170


                      B                    1.771208


              two                    A                    0.816482


                      B                    1.100230


dtype        :                float64

多个索引作为索引), 其堆叠的反向操作是未堆栈, 上面的数据默认反堆叠到上一级别:

Python

In                 [        97        ]        :                 stacked        .        unstack        (        )

Out        [        97        ]        :         

                             A                         B

first         second                    

bar           one                     0.029399                 -        0.542108

              two                     0.282696                 -        0.087302

baz           one                    -        1.575170                  1.771208

              two                     0.816482                  1.100230

                

In                 [        98        ]        :                 stacked        .        unstack        (        1        )

Out        [        98        ]        :         

second                one               two

first                      

bar                   A                  0.029399                  0.282696

              B                 -        0.542108                 -        0.087302

baz                   A                 -        1.575170                  0.816482

              B                  1.771208                  1.100230

                

In                 [        99        ]        :                 stacked        .        unstack        (        0        )

Out        [        99        ]        :         

first                  bar               baz

second                      

one                    A                  0.029399                 -        1.575170

               B                 -        0.542108                  1.771208

two                    A                  0.282696                  0.816482

               B                 -        0.087302                  1.100230

数据透视表

查看数据透视表.

Python

In                [        100        ]        :                df                =                pd        .        DataFrame        (        {        'A'                :                [        'one'        ,                'one'        ,                'two'        ,                'three'        ]                *                3        ,


          .        .        .        .        .        :                                    'B'                :                [        'A'        ,                'B'        ,                'C'        ]                *                4        ,


          .        .        .        .        .        :                                    'C'                :                [        'foo'        ,                'foo'        ,                'foo'        ,                'bar'        ,                'bar'        ,                'bar'        ]                *                2        ,


          .        .        .        .        .        :                                    'D'                :                np        .        random        .        randn        (        12        )        ,


          .        .        .        .        .        :                                    'E'                :                np        .        random        .        randn        (        12        )        }        )


          .        .        .        .        .        :        


In                [        101        ]        :                df


Out        [        101        ]        :        


                A                  B                    C                        D                        E


0                    one                  A                  foo                  1.418757                -        0.179666


1                    one                  B                  foo                -        1.879024                  1.291836


2                    two                  C                  foo                  0.536826                -        0.009614


3                  three                  A                  bar                  1.006160                  0.392149


4                    one                  B                  bar                -        0.029716                  0.264599


5                    one                  C                  bar                -        1.146178                -        0.057409


6                    two                  A                  foo                  0.100900                -        1.425638


7                  three                  B                  foo                -        1.035018                  1.024098


8                    one                  C                  foo                  0.314665                -        0.106062


9                    one                  A                  bar                -        0.773723                  1.824375


10                    two                  B                  bar                -        1.170653                  0.595974


11                  three                  C                  bar                  0.648740                  1.167115

我们可以从此数据非常容易的产生数据透视表:

Python

In                 [        102        ]        :                 pd        .        pivot_table        (        df        ,                 values        =        'D'        ,                 index        =        [        'A'        ,                 'B'        ]        ,                 columns        =        [        'C'        ]        )

Out        [        102        ]        :         

C                             bar               foo

A                     B                            

one                   A                 -        0.773723                  1.418757

              B                 -        0.029716                 -        1.879024

              C                 -        1.146178                  0.314665

three                 A                  1.006160                       NaN

              B                       NaN                 -        1.035018

              C                  0.648740                       NaN

two                   A                       NaN                  0.100900

              B                 -        1.170653                       NaN

              C                       NaN                  0.536826

时间序列

pandas有易用,强大且高效的函数用于高频数据重采样转换操作(例如,转换秒数据到5分钟数据), 这是很普遍的情况，但并不局限于金融应用, 请参阅时间序列章节

Python

In                [        103        ]        :                rng                =                pd        .        date_range        (        '1/1/2012'        ,                periods        =        100        ,                freq        =        'S'        )


In                [        104        ]        :                ts                =                pd        .        Series        (        np        .        random        .        randint        (        0        ,                500        ,                len        (        rng        )        )        ,                index        =        rng        )


In                [        105        ]        :                ts        .        resample        (        '5Min'        ,                how        =        'sum'        )


Out        [        105        ]        :        


2012        -        01        -        01                    25083


Freq        :                5T        ,                dtype        :                int32

时区表示

Python

In                 [        106        ]        :                 rng                 =                 pd        .        date_range        (        '3/6/2012 00:00'        ,                 periods        =        5        ,                 freq        =        'D'        )

In                 [        107        ]        :                 ts                 =                 pd        .        Series        (        np        .        random        .        randn        (        len        (        rng        )        )        ,                 rng        )

In                 [        108        ]        :                 ts

Out        [        108        ]        :         

2012        -        03        -        06                    0.464000

2012        -        03        -        07                    0.227371

2012        -        03        -        08                   -        0.496922

2012        -        03        -        09                    0.306389

2012        -        03        -        10                   -        2.290613

Freq        :                 D        ,                 dtype        :                 float64

                

In                 [        109        ]        :                 ts_utc                 =                 ts        .        tz_localize        (        'UTC'        )

In                 [        110        ]        :                 ts_utc

Out        [        110        ]        :         

2012        -        03        -        06                 00        :        00        :        00        +        00        :        00                    0.464000

2012        -        03        -        07                 00        :        00        :        00        +        00        :        00                    0.227371

2012        -        03        -        08                 00        :        00        :        00        +        00        :        00                   -        0.496922

2012        -        03        -        09                 00        :        00        :        00        +        00        :        00                    0.306389

2012        -        03        -        10                 00        :        00        :        00        +        00        :        00                   -        2.290613

Freq        :                 D        ,                 dtype        :                 float64

转换到其它时区

Python

In                [        111        ]        :                ts_utc        .        tz_convert        (        'US/Eastern'        )


Out        [        111        ]        :        


2012        -        03        -        05                19        :        00        :        00        -        05        :        00                    0.464000


2012        -        03        -        06                19        :        00        :        00        -        05        :        00                    0.227371


2012        -        03        -        07                19        :        00        :        00        -        05        :        00                  -        0.496922


2012        -        03        -        08                19        :        00        :        00        -        05        :        00                    0.306389


2012        -        03        -        09                19        :        00        :        00        -        05        :        00                  -        2.290613


Freq        :                D        ,                dtype        :                float64

转换不同的时间跨度

Python

In                 [        112        ]        :                 rng                 =                 pd        .        date_range        (        '1/1/2012'        ,                 periods        =        5        ,                 freq        =        'M'        )

In                 [        113        ]        :                 ts                 =                 pd        .        Series        (        np        .        random        .        randn        (        len        (        rng        )        )        ,                 index        =        rng        )

In                 [        114        ]        :                 ts

Out        [        114        ]        :         

2012        -        01        -        31                   -        1.134623

2012        -        02        -        29                   -        1.561819

2012        -        03        -        31                   -        0.260838

2012        -        04        -        30                    0.281957

2012        -        05        -        31                    1.523962

Freq        :                 M        ,                 dtype        :                 float64

                

In                 [        115        ]        :                 ps                 =                 ts        .        to_period        (        )

In                 [        116        ]        :                 ps

Out        [        116        ]        :         

2012        -        01                   -        1.134623

2012        -        02                   -        1.561819

2012        -        03                   -        0.260838

2012        -        04                    0.281957

2012        -        05                    1.523962

Freq        :                 M        ,                 dtype        :                 float64

                

In                 [        117        ]        :                 ps        .        to_timestamp        (        )

Out        [        117        ]        :         

2012        -        01        -        01                   -        1.134623

2012        -        02        -        01                   -        1.561819

2012        -        03        -        01                   -        0.260838

2012        -        04        -        01                    0.281957

2012        -        05        -        01                    1.523962

Freq        :                 MS        ,                 dtype        :                 float64

转换时段并且使用一些运算函数, 下例中, 我们转换年报11月到季度结束每日上午9点数据

Python

In                [        118        ]        :                prng                =                pd        .        period_range        (        '1990Q1'        ,                '2000Q4'        ,                freq        =        'Q-NOV'        )


In                [        119        ]        :                ts                =                pd        .        Series        (        np        .        random        .        randn        (        len        (        prng        )        )        ,                prng        )


In                [        120        ]        :                ts        .        index                =                (        prng        .        asfreq        (        'M'        ,                'e'        )                +                1        )        .        asfreq        (        'H'        ,                's'        )                +                9


In                [        121        ]        :                ts        .        head        (        )


Out        [        121        ]        :        


1990        -        03        -        01                09        :        00                  -        0.902937


1990        -        06        -        01                09        :        00                    0.068159


1990        -        09        -        01                09        :        00                  -        0.057873


1990        -        12        -        01                09        :        00                  -        0.368204


1991        -        03        -        01                09        :        00                  -        1.144073


Freq        :                H        ,                dtype        :                float64

分类

数据桢中包含分类. 完整的文档, 请查看分类介绍 and the API文档.

Python

In                 [        122        ]        :                 df                 =                 pd        .        DataFrame        (        {        "id"        :        [        1        ,        2        ,        3        ,        4        ,        5        ,        6        ]        ,                 "raw_grade"        :        [        'a'        ,                 'b'        ,                 'b'        ,                 'a'        ,                 'a'        ,                 'e'        ]        }        )

转换原始类别为分类数据类型.

Python

In                [        123        ]        :                df        [        "grade"        ]                =                df        [        "raw_grade"        ]        .        astype        (        "category"        )


In                [        124        ]        :                df        [        "grade"        ]


Out        [        124        ]        :        


0                    a


1                    b


2                    b


3                    a


4                    a


5                    e


Name        :                grade        ,                dtype        :                category


Categories                (        3        ,                object        )        :                [        a        ,                b        ,                e        ]

Series.cat.categories对应位置!)

Python

In                 [        125        ]        :                 df        [        "grade"        ]        .        cat        .        categories                 =                 [        "very good"        ,                 "good"        ,                 "very bad"        ]

重排顺分类,同时添加缺少的分类(序列 .cat方法下返回新默认序列)

Python

In                [        126        ]        :                df        [        "grade"        ]                =                df        [        "grade"        ]        .        cat        .        set_categories        (        [        "very bad"        ,                "bad"        ,                "medium"        ,                "good"        ,                "very good"        ]        )


In                [        127        ]        :                df        [        "grade"        ]


Out        [        127        ]        :        


0                    very        good


1                        good


2                        good


3                    very        good


4                    very        good


5                    very        bad


Name        :                grade        ,                dtype        :                category


Categories                (        5        ,                object        )        :                [        very        bad        ,                bad        ,                medium        ,                good        ,                very        good        ]

排列分类中的顺序,不是按词汇排列.

Python

In                 [        128        ]        :                 df        .        sort        (        "grade"        )

Out        [        128        ]        :         

           id                 raw_grade              grade

5                   6                         e                   very         bad

1                   2                         b                       good

2                   3                         b                       good

0                   1                         a                  very         good

3                   4                         a                  very         good

4                   5                         a                  very         good

类别列分组,并且也显示空类别.

Python

In                [        129        ]        :                df        .        groupby        (        "grade"        )        .        size        (        )


Out        [        129        ]        :        


grade


very        bad                      1


bad                NaN


medium              NaN


good                          2


very        good                    3


dtype        :                float64

绘图

绘图文档.

Python

In                 [        130        ]        :                 ts                 =                 pd        .        Series        (        np        .        random        .        randn        (        1000        )        ,                 index        =        pd        .        date_range        (        '1/1/2000'        ,                 periods        =        1000        )        )

In                 [        131        ]        :                 ts                 =                 ts        .        cumsum        (        )

In                 [        132        ]        :                 ts        .        plot        (        )

Out        [        132        ]        :                 <        matplotlib        .        axes        .        _subplots        .        AxesSubplot         at                 0xb02091ac        >

十分钟搞定pandas（二）_Python

在数据桢中,可以很方便的绘制带标签列:

Python

In                [        133        ]        :                df                =                pd        .        DataFrame        (        np        .        random        .        randn        (        1000        ,                4        )        ,                index        =        ts        .        index        ,


          .        .        .        .        .        :                                  columns        =        [        'A'        ,                'B'        ,                'C'        ,                'D'        ]        )


          .        .        .        .        .        :        


In                [        134        ]        :                df                =                df        .        cumsum        (        )


In                [        135        ]        :                plt        .        figure        (        )        ;                df        .        plot        (        )        ;                plt        .        legend        (        loc        =        'best'        )


Out        [        135        ]        :                <        matplotlib        .        legend        .        Legend        at                0xb01c9cac        >

十分钟搞定pandas（二）_python_02

获取数据输入/输出

CSV

写入csv文件

Python

In                 [        136        ]        :                 df        .        to_csv        (        'foo.csv'        )

读取csv文件

Python

In                [        137        ]        :                pd        .        read_csv        (        'foo.csv'        )


Out        [        137        ]        :        


            Unnamed        :                0                          A                          B                        C                          D


0                    2000        -        01        -        01                  0.266457                  -        0.399641                -        0.219582                  1.186860


1                    2000        -        01        -        02                  -        1.170732                  -        0.345873                  1.653061                  -        0.282953


2                    2000        -        01        -        03                  -        1.734933                  0.530468                  2.060811                  -        0.515536


3                    2000        -        01        -        04                  -        1.555121                  1.452620                  0.239859                  -        1.156896


4                    2000        -        01        -        05                  0.578117                  0.511371                  0.103552                  -        2.428202


5                    2000        -        01        -        06                  0.478344                  0.449933                -        0.741620                  -        1.962409


6                    2000        -        01        -        07                  1.235339                  -        0.091757                -        1.543861                  -        1.084753


.        .                          .        .        .                        .        .        .                        .        .        .                      .        .        .                        .        .        .


993                  2002        -        09        -        20                -        10.628548                  -        9.153563                -        7.883146                  28.313940


994                  2002        -        09        -        21                -        10.390377                  -        8.727491                -        6.399645                  30.914107


995                  2002        -        09        -        22                  -        8.985362                  -        8.485624                -        4.669462                  31.367740


996                  2002        -        09        -        23                  -        9.558560                  -        8.781216                -        4.499815                  30.518439


997                  2002        -        09        -        24                  -        9.902058                  -        9.340490                -        4.386639                  30.105593


998                  2002        -        09        -        25                -        10.216020                  -        9.480682                -        3.933802                  29.758560


999                  2002        -        09        -        26                -        11.856774                -        10.671012                -        3.216025                  29.369368


                 


[        1000                rows                x                5                columns        ]

HDF5

读写HDF存储

写入HDF5存储

Python

In                 [        138        ]        :                 df        .        to_hdf        (        'foo.h5'        ,        'df'        )

读取HDF5存储

Python

In                [        139        ]        :                pd        .        read_hdf        (        'foo.h5'        ,        'df'        )


Out        [        139        ]        :        


                            A                          B                        C                          D


2000        -        01        -        01                  0.266457                  -        0.399641                -        0.219582                  1.186860


2000        -        01        -        02                  -        1.170732                  -        0.345873                  1.653061                  -        0.282953


2000        -        01        -        03                  -        1.734933                  0.530468                  2.060811                  -        0.515536


2000        -        01        -        04                  -        1.555121                  1.452620                  0.239859                  -        1.156896


2000        -        01        -        05                  0.578117                  0.511371                  0.103552                  -        2.428202


2000        -        01        -        06                  0.478344                  0.449933                -        0.741620                  -        1.962409


2000        -        01        -        07                  1.235339                  -        0.091757                -        1.543861                  -        1.084753


.        .        .                              .        .        .                        .        .        .                      .        .        .                        .        .        .


2002        -        09        -        20                -        10.628548                  -        9.153563                -        7.883146                  28.313940


2002        -        09        -        21                -        10.390377                  -        8.727491                -        6.399645                  30.914107


2002        -        09        -        22                  -        8.985362                  -        8.485624                -        4.669462                  31.367740


2002        -        09        -        23                  -        9.558560                  -        8.781216                -        4.499815                  30.518439


2002        -        09        -        24                  -        9.902058                  -        9.340490                -        4.386639                  30.105593


2002        -        09        -        25                -        10.216020                  -        9.480682                -        3.933802                  29.758560


2002        -        09        -        26                -        11.856774                -        10.671012                -        3.216025                  29.369368


                 


[        1000                rows                x                4                columns        ]

Excel

读写MS Excel

写入excel文件

Python

In                 [        140        ]        :                 df        .        to_excel        (        'foo.xlsx'        ,                 sheet_name        =        'Sheet1'        )

读取excel文件

Python

In                [        141        ]        :                pd        .        read_excel        (        'foo.xlsx'        ,                'Sheet1'        ,                index_col        =        None        ,                na_values        =        [        'NA'        ]        )


Out        [        141        ]        :        


                            A                          B                        C                          D


2000        -        01        -        01                  0.266457                  -        0.399641                -        0.219582                  1.186860


2000        -        01        -        02                  -        1.170732                  -        0.345873                  1.653061                  -        0.282953


2000        -        01        -        03                  -        1.734933                  0.530468                  2.060811                  -        0.515536


2000        -        01        -        04                  -        1.555121                  1.452620                  0.239859                  -        1.156896


2000        -        01        -        05                  0.578117                  0.511371                  0.103552                  -        2.428202


2000        -        01        -        06                  0.478344                  0.449933                -        0.741620                  -        1.962409


2000        -        01        -        07                  1.235339                  -        0.091757                -        1.543861                  -        1.084753


.        .        .                              .        .        .                        .        .        .                      .        .        .                        .        .        .


2002        -        09        -        20                -        10.628548                  -        9.153563                -        7.883146                  28.313940


2002        -        09        -        21                -        10.390377                  -        8.727491                -        6.399645                  30.914107


2002        -        09        -        22                  -        8.985362                  -        8.485624                -        4.669462                  31.367740


2002        -        09        -        23                  -        9.558560                  -        8.781216                -        4.499815                  30.518439


2002        -        09        -        24                  -        9.902058                  -        9.340490                -        4.386639                  30.105593


2002        -        09        -        25                -        10.216020                  -        9.480682                -        3.933802                  29.758560


2002        -        09        -        26                -        11.856774                -        10.671012                -        3.216025                  29.369368


                 


[        1000                rows                x                4                columns        ]

陷阱

如果尝试这样操作可能会看到像这样的异常:

Python

>>>                if                pd        .        Series        (        [        False        ,                True        ,                False        ]        )        :


            print        (        "I was true"        )


Traceback


            .        .        .


ValueError        :                The        truth        value        of        an        array                is                ambiguous        .                Use                a        .        empty        ,                a        .        any        (        )                or                a        .        all        (        )        .

查看对照获取解释和怎么做的帮助

也可以查看陷阱.