1. 背景
如果想对某一列的内容,进行操作,把它分割为两列,应该如何处理?
最近要实现一个功能,网上面搜索了很久,还是没有找到合理的答案。于是看了说明文档,半分钟就解决了,真是所有的文档,还是说明文档香
2. 模拟数据
代码:
df = data.frame(ID = 1:2,name = c("Smith, John, Ketty", "Walker, Mike"))
df结果
> df ID name 1 1 Smith, John, Ketty 2 2 Walker, Mike
3. 用seperate函数进行分割
结果:
> df %>% separate(2,into = c("a","b"))
ID a b
1 1 Smith John
2 2 Walker Mike
Warning message:
Expected 2 pieces. Additional pieces discarded in 1 rows [1].这不是我想要的答案,我想按第一个逗号分割,后面的为一个整体。
4. 看一下函数的说明文档
Arguments data A data frame. col Column name or position. This is passed to tidyselect::vars_pull(). This argument is passed by expression and supports quasiquotation (you can unquote column names or column positions). into Names of new variables to create as character vector. Use NA to omit the variable in the output. sep Separator between columns. If character, sep is interpreted as a regular expression. The default value is a regular expression that matches any sequence of non-alphanumeric values. If numeric, sep is interpreted as character positions to split at. Positive values start at 1 at the far-left of the string; negative value start at -1 at the far-right of the string. The length of sep should be one less than into. remove If TRUE, remove input column from output data frame. convert If TRUE, will run type.convert() with = TRUE on new columns. This is useful if the component columns are integer, numeric or logical. NB: this will cause string "NA"s to be converted to NAs. extra If sep is a character vector, this controls what happens when there are too many pieces. There are three valid options: "warn" (the default): emit a warning and drop extra values. "drop": drop any extra values without a warning. "merge": only splits at most length(into) times fill If sep is a character vector, this controls what happens when there are not enough pieces. There are three valid options: "warn" (the default): emit a warning and fill from the right "right": fill with missing values on the right "left": fill with missing values on the left ... Additional arguments passed on to methods.
重点的部分:

这个extra参数,主要的应用场景是使用sep分割时,分割为多个片段的情况,其它片段如何处理的问题。
- 默认 extra = “warn”,给出报警,同时删掉多余的片段
- extra = “drop”,直接删掉,没有报警
- extra = “merge”,保留
所以我的应用场景应该是extra = “merge”。
5. 正确的代码
> df %>% separate(2,into = c("a","b"),extra = "merge")
ID a b
1 1 Smith John, Ketty
2 2 Walker Mike搞定!
6. 感想
用了百度
用了谷歌
用了各种关键词
最后发现
还是说明文档最香
特别是很多小众的软件,更应该多看看说明文档,如果是开源的软件,可以看看源码,帮助性更大。
















