学习笔记,仅供参考,有错必纠

参考自:Julia数据科学应用–Zacharias Voulgaris;​​官方文档​​​;Julia数据处理常用包_DataFrames包测试

使用Julia-1.1.1



学生得分描述性统计案例



导入包,并导入数据:

using DataFrames
using CSV
mydata = CSV.read("./data/score.csv");
println(mydata)

输出:

10××4 DataFrame
││ Row ││ Column1 ││ age ││ money ││ score ││
││ ││ String ││ Int64 ││ Int64 ││ Int64 ││
├├──────────┼┼──────────────────┼┼──────────────┼
┤┤
││ 1 ││ A ││ 19 ││ 1000 ││ 99 ││
││ 2 ││ B ││ 20 ││ 2000 ││ 100 ││
││ 3 ││ C ││ 19 ││ 9999 ││ 50 ││
││ 4 ││ D ││ 21 ││ 3456 ││ 69 ││
││ 5 ││ E ││ 22 ││ 8999 ││ 95 ││
││ 6 ││ F ││ 25 ││ 887 ││ 76 ││
││ 7 ││ G ││ 28 ││ 2600 ││ 85 ││
││ 8 ││ H ││ 20 ││ 8000 ││ 90 ││
││ 9 ││ I ││ 21 ││ 2460 ││ 77 ││
││ 10 ││ J ││ 19 ││ 1000 ││ 84 ││



显示数据框前6行:

head(mydata)

输出:

6××4 DataFrame
││ Row ││ Column1 ││ age ││ money ││ score ││
││ ││ String ││ Int64 ││ Int64 ││ Int64 ││
├├──────────┼┼──────────────────┼┼──────────────┼┼
┤┤
││ 1 ││ A ││ 19 ││ 1000 ││ 99 ││
││ 2 ││ B ││ 20 ││ 2000 ││ 100 ││
││ 3 ││ C ││ 19 ││ 9999 ││ 50 ││
││ 4 ││ D ││ 21 ││ 3456 ││ 69 ││
││ 5 ││ E ││ 22 ││ 8999 ││ 95 ││
││ 6 ││ F ││ 25 ││ 887 ││ 76 ││

显示数据后6行:

tail(mydata)

输出:

6××4 DataFrame
││ Row ││ Column1 ││ age ││ money ││ score ││
││ ││ String ││ Int64 ││ Int64 ││ Int64 ││
├├──────────┼┼──────────────────┼┼──────────────┼┼
┤┤
││ 1 ││ E ││ 22 ││ 8999 ││ 95 ││
││ 2 ││ F ││ 25 ││ 887 ││ 76 ││
││ 3 ││ G ││ 28 ││ 2600 ││ 85 ││
││ 4 ││ H ││ 20 ││ 8000 ││ 90 ││
││ 5 ││ I ││ 21 ││ 2460 ││ 77 ││
││ 6 ││ J ││ 19 ││ 1000 ││ 84 ││



返回数据的描述性统计信息:

describe(mydata)

输出:

variable

mean

min

median

max

nunique

nmissing

eltype

Symbol

Union…

Any

Union…

Any

Union…

Nothing

DataType

1

Column1

A

J

10

String

2

age

21.4

19

20.5

28

Int64

3

money

4040.1

887

2530.0

9999

Int64

4

score

82.5

50

84.5

100

Int64


返回age大于22的记录:

mydata[mydata[:age] .> 22, :]

输出:

Column1

age

money

score

String

Int64

Int64

Int64

1

F

25

887

76

2

G

28

2600

85


求age和money的平均值:

colwise(mean, mydata[[:age, :score]])

输出:

2-element Array{Float64,1}:
21.4
82.5



给mydata数据框增加一列等级(grade)列:

mydata[:grade] = ["A", "B", "C", "D", "A", "A", "B", "B", "C", "D"]



删除mydata最后两行:

deleterows!(mydata, 9:10);



按照grade给mydata数据框分组:

by(mydata, :grade, nrow)

输出:

││ Row ││ grade  ││ nrow  ││
││ ││ String ││ Int64 ││
├├──────────┼┼────────────────
││ 1 ││ A ││ 3 ││
││ 2 ││ B ││ 3 ││
││ 3 ││ C ││ 1 ││
││ 4 ││ D ││ 1 ││



计算age与score之间的皮尔逊相关系数:

cor(mydata[:age], mydata[:score])
#返回值
0.019667052513438126