这篇文章讲述的是R语言中关于数据框的相关知识。希望这篇R语言文章对您有所帮助!如果您有想学习的知识或建议,可以给作者留言~
Chapter04 | 数据框
数据框的特点:
1、数据框是一种表格式的数据结构。数据框旨在模拟数据集,与其他统计软件例如SAS或SPASS中的数据集的概念一致。
2、数据集通常是由数据构成的一个矩阵数组,行表示观测,列表示变量。不同的行业对于数据集的行和列叫法不同。
3、数据框实际上是一个列表。列表中的元素是向量,这些向量构成数据框的列,每一列必须具有相同的长度,所以数据框是矩形结构,而且数剧框的列必须命名。
常见数据框:
1、iris
2、mtcars
3、rock
矩阵与数据框:
1、数据框形状上很像矩阵
2、数据框是比较规则的列表
3、矩阵必须为同一数据类型
4、数据框每一列必须同一类型,每一行可以不同
> state <- data.frame(state.name,state.abb,state.region,state.x77)# 数据框的访问 通过索引进行访问> state[1]
state.nameAlabama AlabamaAlaska AlaskaArizona ArizonaArkansas ArkansasCalifornia CaliforniaColorado ColoradoConnecticut ConnecticutDelaware DelawareFlorida FloridaGeorgia GeorgiaHawaii HawaiiIdaho IdahoIllinois IllinoisIndiana IndianaIowa IowaKansas KansasKentucky KentuckyLouisiana LouisianaMaine MaineMaryland MarylandMassachusetts MassachusettsMichigan MichiganMinnesota MinnesotaMississippi MississippiMissouri MissouriMontana MontanaNebraska NebraskaNevada NevadaNew Hampshire New HampshireNew Jersey New JerseyNew Mexico New MexicoNew York New YorkNorth Carolina North CarolinaNorth Dakota North DakotaOhio OhioOklahoma OklahomaOregon OregonPennsylvania PennsylvaniaRhode Island Rhode IslandSouth Carolina South CarolinaSouth Dakota South DakotaTennessee TennesseeTexas TexasUtah UtahVermont VermontVirginia VirginiaWashington WashingtonWest Virginia West VirginiaWisconsin WisconsinWyoming Wyoming# 也可以通过数组进行访问,添加负索引表示除此之外> state[c(2,4)]
state.abb PopulationAlabama AL 3615Alaska AK 365Arizona AZ 2212Arkansas AR 2110California CA 21198Colorado CO 2541Connecticut CT 3100Delaware DE 579Florida FL 8277Georgia GA 4931Hawaii HI 868Idaho ID 813Illinois IL 11197Indiana IN 5313Iowa IA 2861Kansas KS 2280Kentucky KY 3387Louisiana LA 3806Maine ME 1058Maryland MD 4122Massachusetts MA 5814Michigan MI 9111Minnesota MN 3921Mississippi MS 2341Missouri MO 4767Montana MT 746Nebraska NE 1544Nevada NV 590New Hampshire NH 812New Jersey NJ 7333New Mexico NM 1144New York NY 18076North Carolina NC 5441North Dakota ND 637Ohio OH 10735Oklahoma OK 2715Oregon OR 2284Pennsylvania PA 11860Rhode Island RI 931South Carolina SC 2816South Dakota SD 681Tennessee TN 4173Texas TX 12237Utah UT 1203Vermont VT 472Virginia VA 4981Washington WA 3559West Virginia WV 1799Wisconsin WI 4589Wyoming WY 376# 利用行和列的名字可以直接搜索想要内容> state[,"state.abb"]
[1] AL AK AZ AR CA CO CT DE FL GA HI ID IL IN IA KS KY LA ME MD MA MI MN MS MO MT NE NV NH NJ[31] NM NY NC ND OH OK OR PA RI SC SD TN TX UT VT VA WA WV WI WY50 Levels: AK AL AR AZ CA CO CT DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO MS ... WY> state["Alabama",]
state.name state.abb state.region Population Income Illiteracy Life.Exp MurderAlabama Alabama AL South 3615 3624 2.1 69.05 15.1
HS.Grad Frost AreaAlabama 41.3 20 50708123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127
# 使用$的访问方式,最常用的方式,可以快速取出任意的一列,再后面的分析和画图中很重要women$height
[1] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72# 可以快速绘制散点图plot(women$height,women$weight)1234567
> lm (weight~height,data = women)Call:lm(formula = weight ~ height, data = women)Coefficients:(Intercept) height
-87.52 3.45 12345678
# 如果没回都使用$可能会不方便,这是可以通过attach函数进行加载> attach(mtcars)# 加载完以后直接搜索即可> mpg
[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4[19] 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4> hp [1] 110 110 93 110 175 105 245 62 95 123 123 180 180 180 205 215 230 66 52 65 97 150[23] 150 245 175 66 91 113 264 175 335 109# detach函数取消加载> detach(mtcars) # with函数与attach类似> with(mtcars,{mpg})
[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4[19] 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4123456789101112131415161718
转载自:CSDN 作者:不温卜火
原文链接:https://blog.csdn.net/qq_16146103/article/details/105418454