学习笔记,仅供参考,有错必纠
参考自:Julia数据科学应用–Zacharias Voulgaris;官方文档;Julia数据处理常用包_DataFrames包测试
使用Julia-1.1.1
学生得分描述性统计案例
导入包,并导入数据:
using DataFrames
using CSV
mydata = CSV.read("./data/score.csv");
println(mydata)
输出:
10××4 DataFrame
││ Row ││ Column1 ││ age ││ money ││ score ││
││ ││ String ││ Int64 ││ Int64 ││ Int64 ││
├├──────────┼┼──────────────────┼┼──────────────┼
┤┤
││ 1 ││ A ││ 19 ││ 1000 ││ 99 ││
││ 2 ││ B ││ 20 ││ 2000 ││ 100 ││
││ 3 ││ C ││ 19 ││ 9999 ││ 50 ││
││ 4 ││ D ││ 21 ││ 3456 ││ 69 ││
││ 5 ││ E ││ 22 ││ 8999 ││ 95 ││
││ 6 ││ F ││ 25 ││ 887 ││ 76 ││
││ 7 ││ G ││ 28 ││ 2600 ││ 85 ││
││ 8 ││ H ││ 20 ││ 8000 ││ 90 ││
││ 9 ││ I ││ 21 ││ 2460 ││ 77 ││
││ 10 ││ J ││ 19 ││ 1000 ││ 84 ││
显示数据框前6行:
head(mydata)
输出:
6××4 DataFrame
││ Row ││ Column1 ││ age ││ money ││ score ││
││ ││ String ││ Int64 ││ Int64 ││ Int64 ││
├├──────────┼┼──────────────────┼┼──────────────┼┼
┤┤
││ 1 ││ A ││ 19 ││ 1000 ││ 99 ││
││ 2 ││ B ││ 20 ││ 2000 ││ 100 ││
││ 3 ││ C ││ 19 ││ 9999 ││ 50 ││
││ 4 ││ D ││ 21 ││ 3456 ││ 69 ││
││ 5 ││ E ││ 22 ││ 8999 ││ 95 ││
││ 6 ││ F ││ 25 ││ 887 ││ 76 ││
显示数据后6行:
tail(mydata)
输出:
6××4 DataFrame
││ Row ││ Column1 ││ age ││ money ││ score ││
││ ││ String ││ Int64 ││ Int64 ││ Int64 ││
├├──────────┼┼──────────────────┼┼──────────────┼┼
┤┤
││ 1 ││ E ││ 22 ││ 8999 ││ 95 ││
││ 2 ││ F ││ 25 ││ 887 ││ 76 ││
││ 3 ││ G ││ 28 ││ 2600 ││ 85 ││
││ 4 ││ H ││ 20 ││ 8000 ││ 90 ││
││ 5 ││ I ││ 21 ││ 2460 ││ 77 ││
││ 6 ││ J ││ 19 ││ 1000 ││ 84 ││
返回数据的描述性统计信息:
describe(mydata)
输出:
variable | mean | min | median | max | nunique | nmissing | eltype | |
Symbol | Union… | Any | Union… | Any | Union… | Nothing | DataType | |
1 | Column1 | A | J | 10 | String | |||
2 | age | 21.4 | 19 | 20.5 | 28 | Int64 | ||
3 | money | 4040.1 | 887 | 2530.0 | 9999 | Int64 | ||
4 | score | 82.5 | 50 | 84.5 | 100 | Int64 |
返回age大于22的记录:
mydata[mydata[:age] .> 22, :]
输出:
Column1 | age | money | score | |
String | Int64 | Int64 | Int64 | |
1 | F | 25 | 887 | 76 |
2 | G | 28 | 2600 | 85 |
求age和money的平均值:
colwise(mean, mydata[[:age, :score]])
输出:
2-element Array{Float64,1}:
21.4
82.5
给mydata数据框增加一列等级(grade)列:
mydata[:grade] = ["A", "B", "C", "D", "A", "A", "B", "B", "C", "D"]
删除mydata最后两行:
deleterows!(mydata, 9:10);
按照grade给mydata数据框分组:
by(mydata, :grade, nrow)
输出:
││ Row ││ grade ││ nrow ││
││ ││ String ││ Int64 ││
├├──────────┼┼────────────────
││ 1 ││ A ││ 3 ││
││ 2 ││ B ││ 3 ││
││ 3 ││ C ││ 1 ││
││ 4 ││ D ││ 1 ││
计算age与score之间的皮尔逊相关系数:
cor(mydata[:age], mydata[:score])
#返回值
0.019667052513438126