Series的概念

之前讲述的一直是DataFrame结构,也是Pandas中最核心的结构
我们把dataFrame进行分解,其中的一行,或者一列,就是一个Series结构。

  • Series:collection of values
  • DataFrame: collection of Series objects
import pandas as pd
fandango=pd.read_csv("fandango_score_comparison.csv")

#提取一个列,一个列就是一个series
series_film=fandango["FILM"]
#获取这个列的type,可以得到类型为<class 'pandas.core.series.Series'>,即一个Series
print(type(series_film))
#通过索引和切片的值得到数据
print(series_film[0:5])

series_rt=fandango["RottenTomatoes"]
print(series_rt[0:5])
fandango.head()

Series内部结构以及Series对象的生成

内部:ndarray对象构成

Import the Series object from pandas

通过下面的Series结构的.values()方法,我们可以得到一个ndarray对象

即DataFrame内部是由Series组成,Series内部则是由一个个ndarray对象构成

  • 注意的是得到的ndarray对象与之前的DataFrame获取元素的方式是不同的:
  • DataFrame需要使用.loc()函数
  • ndarray对象则不需要,当成普通列表处理即可

Pandas其实很多的对象是封装在NumPy之上的,很多函数是把NumPy两个库很多操作都是互通的

Series对象的生成Series()

可以通过Series()方法实现,函数参数为两个ndarray数组,其中一组作为values,另一组作为这组数据相关的一组索引

由nddarray和Series的关系可知,参数也可以是两个列的Series

通过索引获取value:

即如何获取Series中的元素:

  • Series_Name[Index]即可 这种方式返回的是一个value值,如下例:<class ‘numpy.int64’>
  • Series_Name[[Index1,Index2]] 这种方式获得key和value,如下例: <class ‘pandas.core.series.Series’>
from pandas import Series
#调取一个Series的.values方法,返回一个ndarray
film_names=series_film.values
print(type(film_names))
#输出结果为:<class 'numpy.ndarray'>
#注意这里的元素获取和DataFrame对元素的获取是不同的,DataFrame对于元素的获取要使用.loc[]函数
#而这里直接切片即可
print(film_names[0:10])

rt_scores=series_rt.values
print(rt_scores[0:10])

series_custom=Series(rt_scores,index=film_names)
#特别的是一般地索引只能是1,2,3等数字,但Series是可以用字符串作为索引的
#将两个ndarray对象传入,类似于key-value的结构
print(type(series_custom[["Minions (2015)","Leviathan (2014)"]]))
print(type(series_custom["Cinderella (2015)"]))
#这一步骤类似于字典中取出value值的操作,传入一个key,得到一个value,只不过这里是传入了两个,其实也可以传入一个
<class 'numpy.ndarray'>
['Avengers: Age of Ultron (2015)' 'Cinderella (2015)' 'Ant-Man (2015)'
 'Do You Believe? (2015)' 'Hot Tub Time Machine 2 (2015)'
 'The Water Diviner (2015)' 'Irrational Man (2015)' 'Top Five (2014)'
 'Shaun the Sheep Movie (2015)' 'Love & Mercy (2015)']
[74 85 80 18 14 63 42 86 99 89]
<class 'pandas.core.series.Series'>
<class 'numpy.int64'>
series_custom=Series(rt_scores,index=film_names)
#索引的多样性:
#一:字符串作为索引
print(series_custom[["Minions (2015)","Leviathan (2014)"]])
#二:数字下标作为索引
series_custom[5:10]
Minions (2015)      54
Leviathan (2014)    99
dtype: int64





The Water Diviner (2015)        63
Irrational Man (2015)           42
Top Five (2014)                 86
Shaun the Sheep Movie (2015)    99
Love & Mercy (2015)             89
dtype: int64

Series的排序操作

Series的排序使用的不多,使用Sorted()方法。

排序的方法直接使用sorted()函数,可以类比对DataFrame的排序操作,类比DataFrame的*sorted_values()*方法;

使用reindex()方法可以将Series按照重新排序过得index进行排序,类比DataFrame的*reset_index()*方法;

按照index排序还是按照value值进行排序可以分别调用:

  • sort_index()方法
  • sort_values()方法
origin_index=series_custom.index.tolist()
#print(origin_index)  即那些字符串
sorted_index=sorted(origin_index)
#print(sorted_index)  将字符串升序排列
sorted_by_index=series_custom.reindex(sorted_index)
print(sorted_by_index)
'71 (2015)                          97
5 Flights Up (2015)                 52
A Little Chaos (2015)               40
A Most Violent Year (2014)          90
About Elly (2015)                   97
                                    ..
What We Do in the Shadows (2015)    96
When Marnie Was There (2015)        89
While We're Young (2015)            83
Wild Tales (2014)                   96
Woman in Gold (2015)                52
Length: 146, dtype: int64
#按照index的排序
sc2=series_custom.sort_index()
#按照values的排序
sc3=series_custom.sort_values()
print(sc3[0:10])
Paul Blart: Mall Cop 2 (2015)     5
Hitman: Agent 47 (2015)           7
Hot Pursuit (2015)                8
Fantastic Four (2015)             9
Taken 3 (2015)                    9
The Boy Next Door (2015)         10
The Loft (2015)                  11
Unfinished Business (2015)       11
Mortdecai (2015)                 12
Seventh Son (2015)               12
dtype: int64
Series中的每个值可以当做一个ndarray对待,导入numpy后可以使用库的中的函数对values进行操作
# The values in a Series object are treated as a ndarray, the core data type in NumPy
import numpy as np
#add each value with each other
print(np.add(series_custom,series_custom))
#apply sine function to each value
np.sin(series_custom)
#Return the highest value (will return a single value but not a series)
np.max(series_custom)
Avengers: Age of Ultron (2015)               148
Cinderella (2015)                            170
Ant-Man (2015)                               160
Do You Believe? (2015)                        36
Hot Tub Time Machine 2 (2015)                 28
                                            ... 
Mr. Holmes (2015)                            174
'71 (2015)                                   194
Two Days, One Night (2014)                   194
Gett: The Trial of Viviane Amsalem (2015)    200
Kumiko, The Treasure Hunter (2015)           174
Length: 146, dtype: int64





100
使用True和False列表作为index值
#使用True和False列表作为index值
series_greater_than_50=series_custom[series_custom>50]
print(series_custom)

criteria_one=series_custom>50
criteria_two=series_custom<75
both_criteria=series_custom[criteria_one&criteria_two]
print(both_criteria)
Avengers: Age of Ultron (2015)                74
Cinderella (2015)                             85
Ant-Man (2015)                                80
Do You Believe? (2015)                        18
Hot Tub Time Machine 2 (2015)                 14
                                            ... 
Mr. Holmes (2015)                             87
'71 (2015)                                    97
Two Days, One Night (2014)                    97
Gett: The Trial of Viviane Amsalem (2015)    100
Kumiko, The Treasure Hunter (2015)            87
Length: 146, dtype: int64
Avengers: Age of Ultron (2015)                                            74
The Water Diviner (2015)                                                  63
Unbroken (2014)                                                           51
Southpaw (2015)                                                           59
Insidious: Chapter 3 (2015)                                               59
The Man From U.N.C.L.E. (2015)                                            68
Run All Night (2015)                                                      60
5 Flights Up (2015)                                                       52
Welcome to Me (2015)                                                      71
Saint Laurent (2015)                                                      51
Maps to the Stars (2015)                                                  60
Pitch Perfect 2 (2015)                                                    67
The Age of Adaline (2015)                                                 54
The DUFF (2015)                                                           71
Ricki and the Flash (2015)                                                64
Unfriended (2015)                                                         60
American Sniper (2015)                                                    72
The Hobbit: The Battle of the Five Armies (2014)                          61
Paper Towns (2015)                                                        55
Big Eyes (2014)                                                           72
Maggie (2015)                                                             54
Focus (2015)                                                              57
The Second Best Exotic Marigold Hotel (2015)                              62
The 100-Year-Old Man Who Climbed Out the Window and Disappeared (2015)    67
Escobar: Paradise Lost (2015)                                             52
Into the Woods (2014)                                                     71
Inherent Vice (2014)                                                      73
Magic Mike XXL (2015)                                                     62
Woman in Gold (2015)                                                      52
The Last Five Years (2015)                                                60
Jurassic World (2015)                                                     71
Minions (2015)                                                            54
Spare Parts (2015)                                                        52
dtype: int64
相同Index值的数据可以进行加减运算
#首先生成两个Index相同的Series
rt_critics=Series(fandango["RottenTomatoes"].values,index=fandango['FILM'])
rt_users=Series(fandango["RottenTomatoes"].values,index=fandango['FILM'].values)
rt_mean=(rt_critics+rt_users)/2
print(rt_mean)
FILM
Avengers: Age of Ultron (2015)                74.0
Cinderella (2015)                             85.0
Ant-Man (2015)                                80.0
Do You Believe? (2015)                        18.0
Hot Tub Time Machine 2 (2015)                 14.0
                                             ...  
Mr. Holmes (2015)                             87.0
'71 (2015)                                    97.0
Two Days, One Night (2014)                    97.0
Gett: The Trial of Viviane Amsalem (2015)    100.0
Kumiko, The Treasure Hunter (2015)            87.0
Length: 146, dtype: float64