如何选择一个随机数在R



As a language for statistical analysis, R has a comprehensive library of functions for generating random numbers from various statistical distributions.作为一种语言进行统计分析,R有一个随机数生成各种统计分布功能的综合性图书馆。 In this post, I want to focus on the simplest of questions: How do I generate a random number?在这篇文章中,我想专注于简单的问题:我如何生成一个随机数?



The answer depends on what kind of random number you want to generate.答案取决于你想要什么样的随机数生成。 Let's illustrate by example.让我们通过例子说明。



Generate a random number between 5.0 and 7.5 生成一个5.0和7.5之间的随机数



If you want to generate a decimal number where any value (including fractional values) between the stated minimum and maximum is equally likely, use the runif function.如果你想生成一个十进制数规定的最低和最高之间的任何值(包括分数值)同样是可能的,使用runif功能。 This function generates values from the Uniform distribution.这个函数生成均匀分布的值。 Here's how to generate one random number between 5.0 and 7.5:这里是如何生成一个5.0和7.5之间的随机数的方法:



> x1 <- runif(1, 5.0, 7.5) > X1 < - runif(1,5.0,7.5) 
> x1 > X1 
[1] 6.715697 [1] 6.715697



Of course, when you run this, you'll get a different number, but it will definitely be between 5.0 and 7.5.当然,当你运行这个,你会得到一个不同的数字,但它一定会在5.0和7.5之间。 You won't get the values 5.0 or 7.5 exactly, either.你不会得到准确值5.0或7.5。



If you want to generate multiple random values, don't use a loop.如果你想生成多个随机的值,不要使用一个循环。 You can generate several values at once by specifying the number of values you want as the first argument to runif .您可以生成多个值一次通过指定您要作为第一个参数runif值的数目。Here's how to generate 10 values between 5.0 and 7.5:这里是如何产生10 5.0和7.5之间的值:



> x2 <- runif(10, 5.0, 7.5) > 2 < - runif(10,5.0,7.5) 
> x2 > 2 
[1] 6.339188 5.311788 7.099009 5.746380 6.720383 7.433535 7.159988 [1] 6.339188 5.311788 7.099009 5.746380 6.720383 7.433535 7.159988 
[8] 5.047628 7.011670 7.030854 [8] 5.047628 7.011670 7.030854



Generate a random integer between 1 and 10 生成一个1到10之间的随机整数 



This looks like the same exercise as the last one, but now we only want whole numbers, not fractional values.这看起来像最后一个相同的运动,但现在我们只希望完整的数字,而不是分数值。 For that, we use the sample function:为此,我们使用的示例函数:



> x3 <- sample(1:10, 1) > X3 < -样本(1:10,1)
> x3 > X3
[1] 4 [1] 4



The first argument is a vector of valid numbers to generate (here, the numbers 1 to 10), and the second argument indicates one number should be returned.第一个参数是一个有效的数字向量生成(这里的数字1到10),第二个参数表示应返回一个数字。 If we want to generate more than one random number, we have to add an additional argument to indicate that repeats are allowed:如果我们要生成多个随机数,我们必须增加一个额外的参数,表示允许重复:



> x4 <- sample(1:10, 5, replace=T) X4> < -样本(1:10,5,替换= T)
> x4 > X4
[1] 6 9 7 6 5 [1] 6 9 7 6 5



Note the number 6 appears twice in the 5 numbers generated.注意6号在5生成的数字出现了两次。(Here's a fun exercise: what is the probability of running this command and having no repeats in the 5 numbers generated?) (这里有一个有趣的练习:运行此命令,并在5生成的数字有没有重复的概率是什么?)



Select 6 random numbers between 1 and 40, without replacement 选择6 1和40之间的随机数,无需更换



If you wanted to simulate the lotto game common to many countries, where you randomly select 6 balls from 40 (each labelled with a number from 1 to 40), you'd again use the sample function, but this time without replacement:如果你想模拟的乐透游戏,常见到许多国家,在那里你随机选择6个球,从40(每一个从1到40的数字标记),你会再次使用,无需更换样品功能,但这个时间:



> x5 <- sample(1:40, 6, replace=F) X5> < -样本(1:40 6,更换= F的) 
> x5 > X5 
[1] 10 21 29 12 7 31 [1] 10 21 29 12 7 31



replace=F option -- sampling without replacement is the default -- but it doesn't hurt to include it for clarity.另外,你不实际需要,包括 替换= F选项-无需更换采样是默认-但它不会伤害包括它的清晰度。



Select 10 items from a list of 50 选择10个项目,从50名单



You can use this same idea to generate a random subset of any vector, even one that doesn't contain numbers.你可以使用同样的想法产生的任何载体的随机子集,甚至不包含数字。 For example, to select 10 distinct states of the US at random:例如,选择10个不同的美国各州随机:



> sample(state.name, 10) > 样品(state.name 10)
[1] "Virginia" "Oklahoma" "Maryland" "Michigan" 1]“弗吉尼亚”,“俄克拉荷马”,“马里兰”,“密歇根”
[5] "Alaska" "South Dakota" "Minnesota" "Idaho" 5]“阿拉斯加”,“南达科他”,“明尼苏达州”,“爱达荷”
[9] "Indiana" "Connecticut" [9]“夺宝”,“康涅狄格”



You can't sample more values than you have without allowing replacements:你不能品尝更多的价值比你有没有允许更换:



> sample(state.name, 52) > 样品(state.name 52)
Error in sample(state.name, 52) : 抽样误差(state.name,52岁):
cannot take a sample larger than the population when 'replace = FALSE' 不能拿一个样本比较大的人口当“取代= FALSE”



... ... but sampling exactly the number you do have is a great way to randomize the order of a vector.但完全采样,你有一个伟大的方式随机向量的顺序。 Here are the 50 states of the US, in random order:下面是美国50个州中随机顺序:



> sample(state.name, 50) > 样品(state.name 50)
[1] "California" "Iowa" "Hawaii" 1]“加州”,“阿华”,“夏威夷”
[4] "Montana" "South Dakota" "North Dakota" [4]“蒙大拿”南达科他“,”北达科他州“
[7] "Louisiana" "Maine" "Maryland" 7]“路易斯安那州”,“缅因”,“马里兰”
[10] "New Hampshire" "Rhode Island" "Texas" [10]“新罕布什尔”罗德岛“,”得克萨斯“
[13] "Florida" "North Carolina" "Minnesota" [13]“佛罗里达”,“北卡罗莱纳”“明尼苏达”
[16] "Arkansas" "Pennsylvania" "Colorado" 16]“阿肯色州”,“宾夕法尼亚”,“科罗拉多”
[19] "Idaho" "Connecticut" "Utah" 19]“爱达荷”,“康涅狄格”,“爵士”
[22] "South Carolina" "Illinois" "Ohio" [22]“南卡罗来纳州”,“伊利诺伊州”俄亥俄“
[25] "New Jersey" "Indiana" "Wisconsin" 25]“新泽西”,“夺宝”,“威斯康星”
[28] "Mississippi" "Michigan" "Wyoming" [28]“密西西比”,“密歇根”怀俄明“
[31] "West Virginia" "Alaska" "Georgia" 31]“西弗吉尼亚”,“阿拉斯加”,“格鲁吉亚”
[34] "Vermont" "Virginia" "Oklahoma" 34]“佛蒙特州”,“弗吉尼亚”,“俄克拉荷马”
[37] "Washington" "New Mexico" "New York" 37]“华盛顿”新墨西哥“纽约”
[40] "Delaware" "Nevada" "Alabama" [40]“特拉华州”内华达州“,”阿拉巴马“
[43] "Kentucky" "Missouri" "Oregon" [43]“肯德基”,“密苏里”俄勒冈“
[46] "Tennessee" "Arizona" "Massachusetts" [46]“田纳西州”亚利桑那“,”马萨诸塞州“
[49] "Kansas" "Nebraska" [49]“堪萨斯州”,“内布拉斯加”



You could also have just used sample(state.name) for the same result -- sampling as many values as provided is the default.你也有同样的结果只用样品(state.name) -采样提供了许多值是默认的。



Further reading 进一步阅读



For more information about how R generates random numbers, check out the following help pages:欲了解更多有关如何R生成随机数的详细信息,请检查下面的帮助页面:



> ?runif >?runif
> ?sample >样本
> ?.Random.seed



The last of these provides technical detail on the random number generator R uses, and how you can set the random seed to recreate strings of random numbers.这些R使用的随机数发生器提供的技术细节,以及如何可以设置随机种子重新随机数串。



======================================================================================================



原文章如下:



How to choose a random number in R



As a language for statistical analysis, R has a comprehensive library of functions for generating random numbers from various statistical distributions. In this post, I want to focus on the simplest of questions: How do I generate a random number?



The answer depends on what kind of random number you want to generate. Let's illustrate by example.



Generate a random number between 5.0 and 7.5



If you want to generate a decimal number where any value (including fractional values) between the stated minimum and maximum is equally likely, use the runif function. This function generates values from the Uniform distribution. Here's how to generate one random number between 5.0 and 7.5:



> x1 <- runif(1, 5.0, 7.5)
> x1
[1] 6.715697



Of course, when you run this, you'll get a different number, but it will definitely be between 5.0 and 7.5. You won't get the values 5.0 or 7.5 exactly, either.



If you want to generate multiple random values, don't use a loop. You can generate several values at once by specifying the number of values you want as the first argument to runif. Here's how to generate 10 values between 5.0 and 7.5:



> x2 <- runif(10, 5.0, 7.5)
> x2
[1] 6.339188 5.311788 7.099009 5.746380 6.720383 7.433535 7.159988
[8] 5.047628 7.011670 7.030854



Generate a random integer between 1 and 10



This looks like the same exercise as the last one, but now we only want whole numbers, not fractional values. For that, we use the sample function:



> x3 <- sample(1:10, 1)
> x3
[1] 4



The first argument is a vector of valid numbers to generate (here, the numbers 1 to 10), and the second argument indicates one number should be returned. If we want to generate more than one random number, we have to add an additional argument to indicate that repeats are allowed:



> x4 <- sample(1:10, 5, replace=T)
> x4
[1] 6 9 7 6 5



Note the number 6 appears twice in the 5 numbers generated. (Here's a fun exercise: what is the probability of running this command and having no repeats in the 5 numbers generated?)



Select 6 random numbers between 1 and 40, without replacement



If you wanted to simulate the lotto game common to many countries, where you randomly select 6 balls from 40 (each labelled with a number from 1 to 40), you'd again use the sample function, but this time without replacement:



> x5 <- sample(1:40, 6, replace=F)
> x5
[1] 10 21 29 12 7 31



You'll get a different 6 numbers when you run this, but they'll all be between 1 and 40 (inclusive), and no number will repeat. Also, you don't actually need to include the replace=Foption -- sampling without replacement is the default -- but it doesn't hurt to include it for clarity.



Select 10 items from a list of 50



You can use this same idea to generate a random subset of any vector, even one that doesn't contain numbers. For example, to select 10 distinct states of the US at random:



> sample(state.name, 10)
[1] "Virginia" "Oklahoma" "Maryland" "Michigan"
[5] "Alaska" "South Dakota" "Minnesota" "Idaho"
[9] "Indiana" "Connecticut"



You can't sample more values than you have without allowing replacements:



> sample(state.name, 52)
Error in sample(state.name, 52) :
cannot take a sample larger than the population when 'replace = FALSE'



... but sampling exactly the number you do have is a great way to randomize the order of a vector. Here are the 50 states of the US, in random order:



> sample(state.name, 50)
[1] "California" "Iowa" "Hawaii"
[4] "Montana" "South Dakota" "North Dakota"
[7] "Louisiana" "Maine" "Maryland"
[10] "New Hampshire" "Rhode Island" "Texas"
[13] "Florida" "North Carolina" "Minnesota"
[16] "Arkansas" "Pennsylvania" "Colorado"
[19] "Idaho" "Connecticut" "Utah"
[22] "South Carolina" "Illinois" "Ohio"
[25] "New Jersey" "Indiana" "Wisconsin"
[28] "Mississippi" "Michigan" "Wyoming"
[31] "West Virginia" "Alaska" "Georgia"
[34] "Vermont" "Virginia" "Oklahoma"
[37] "Washington" "New Mexico" "New York"
[40] "Delaware" "Nevada" "Alabama"
[43] "Kentucky" "Missouri" "Oregon"
[46] "Tennessee" "Arizona" "Massachusetts"
[49] "Kansas" "Nebraska"



You could also have just used sample(state.name) for the same result -- sampling as many values as provided is the default.



Further reading



For more information about how R generates random numbers, check out the following help pages:



> ?runif
> ?sample
> ?.Random.seed



The last of these provides technical detail on the random number generator R uses, and how you can set the random seed to recreate strings of random numbers.