1、数据结构

下图来自于StackOverflow

mongodb 半结构化数据 mongodb底层数据结构_mongodb 半结构化数据


如上图,左边是B树(中间那是一杠,不是减号),右边是B+树

  • B树【MongoDB】

每个节点都有key和value,因此没有重复的key和value。但查找时会有更多【缓存未命中】的情况,因为当前遍历到的key即便不是我们要的,他对应的数据指针也会一起进入内存,导致内存中的key数量会比B+树少。

最快访问到一个想要节点的可以是O(1),因此对于频繁访问的数据放到离根节点越近的地方,访问会越快,而B+树没有这个优势,因为B+中的内部节点只有key没有对应的data指针。

由于所有数据分布于整颗树上的每一个节点,因此如果要遍历一遍,需要遍历树中所有层,这样的遍历需要回溯,消耗更多栈空间。

  • B+树【MySQL】

内部节点(非叶节点)只有key没有value,因此在内存中,一个内存页可以存储更多的key,因此在查找时比较key大小的时候,【缓存未命中】的情况会更少,因为cache中会有更多的key。

叶节点包含所有的完整的key和相对应的value,因此内部节点的key算是重复,但这样的重复有利于快速查找。

叶节点指针互相连接,因此遍历一遍只需要线性地依次查找所有的叶节点即可。对于MySQL需要访问一个区间数据的情况能提供更高效的查找。

本质区别

  • B树每个节点都有key和对应数据的指针;B+树只有叶节点才有key和对应数据的指针,内部节点只有用于查找的key
  • B树的叶节点之间没有指针;B+树叶节点之间有指针

2、BSON和JSON

JSON格式很常见

{
  "_id": 1,
  "name": { "first" : "John", "last" : "Backus" },
  "contribs": [ "Fortran", "ALGOL", "Backus-Naur Form", "FP" ]
}

而BSON仅仅只是二进制版本的JSON,因此不具有可读性,像这样

\x31\x00\x00\x00
\x04BSON\x00
\x26\x00\x00\x00
\x02\x30\x00\x08\x00\x00\x00awesome\x00
\x01\x31\x00\x33\x33\x33\x33\x33\x33\x14\x40
\x10\x32\x00\xc2\x07\x00\x00
\x00
\x00

3、数据库操作
查看

show dbs

进入数据库,如果test数据库不存在,则自动创建

use test

删除数据库,在test数据库中,使用如下命令,则会删除当前正在使用的数据库

db.dropDatabase()

4、collection操作,相当于表
查看所有collections

show collections

创建collection

db.createCollection("mycollection")

删除collection

db.mycollection.drop()

向collection中插入,如果名为class的collection不存在则自动创建

db.class.insert({'name':'ee547'})

查看表中内容

> db.class.find()
{ "_id" : ObjectId("632a16e412de6627982e581b"), "name" : "ee547" }

5、实战1

创建一个collection

db.createCollection('my_data')

插入数据

db.my_data.insert([
 {
 title: "MongoDB Overview",
 description: "MongoDB is no SQL database",
 by: "tutorials point",
 tags: ["mongodb", "database", "NoSQL"],
 likes: 100
 },
 {
 title: "NoSQL Database",
 description: "NoSQL database doesn't have tables",
 by: "tutorials point",
 tags: ["mongodb", "database", "NoSQL"],
 likes: 20
 }
])

查看刚插入的数据

> db.my_data.find()
{ "_id" : ObjectId("6333b19a3dc8391635c169b0"), "title" : "MongoDB Overview", "description" : "MongoDB is no SQL database", "by" : "tutorials point", "tags" : [ "mongodb", "database", "NoSQL" ], "likes" : 100 }
{ "_id" : ObjectId("6333b19a3dc8391635c169b1"), "title" : "NoSQL Database", "description" : "NoSQL database doesn't have tables", "by" : "tutorials point", "tags" : [ "mongodb", "database", "NoSQL" ], "likes" : 20 }

用更好观察类似JSON格式的方式输出

> db.my_data.find().pretty()
{
        "_id" : ObjectId("6333b19a3dc8391635c169b0"),
        "title" : "MongoDB Overview",
        "description" : "MongoDB is no SQL database",
        "by" : "tutorials point",
        "tags" : [
                "mongodb",
                "database",
                "NoSQL"
        ],
        "likes" : 100
}
{
        "_id" : ObjectId("6333b19a3dc8391635c169b1"),
        "title" : "NoSQL Database",
        "description" : "NoSQL database doesn't have tables",
        "by" : "tutorials point",
        "tags" : [
                "mongodb",
                "database",
                "NoSQL"
        ],
        "likes" : 20
}

用条件过滤查找,这里是and,也可以是or,nor等等;这里表示查找title和by字段为特定字符串的document(即关系数据库中的row)

> db.my_data.find({ $and: [ {title: 'MongoDB Overview'}, {by: 'tutorials point'}]})
{ "_id" : ObjectId("6333b19a3dc8391635c169b0"), "title" : "MongoDB Overview", "description" : "MongoDB is no SQL database", "by" : "tutorials point", "tags" : [ "mongodb", "database", "NoSQL" ], "likes" : 100 }

更新title,可以看到将title为MongoDB Overview的document的title改为了New name

> db.my_data.find()
{ "_id" : ObjectId("6333b19a3dc8391635c169b0"), "title" : "MongoDB Overview", "description" : "MongoDB is no SQL database", "by" : "tutorials point", "tags" : [ "mongodb", "database", "NoSQL" ], "likes" : 100 }
{ "_id" : ObjectId("6333b19a3dc8391635c169b1"), "title" : "NoSQL Database", "description" : "NoSQL database doesn't have tables", "by" : "tutorials point", "tags" : [ "mongodb", "database", "NoSQL" ], "likes" : 20 }
> db.my_data.update({title:'MongoDB Overview'},{$set:{title:'New name'}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
> db.my_data.find()
{ "_id" : ObjectId("6333b19a3dc8391635c169b0"), "title" : "New name", "description" : "MongoDB is no SQL database", "by" : "tutorials point", "tags" : [ "mongodb", "database", "NoSQL" ], "likes" : 100 }
{ "_id" : ObjectId("6333b19a3dc8391635c169b1"), "title" : "NoSQL Database", "description" : "NoSQL database doesn't have tables", "by" : "tutorials point", "tags" : [ "mongodb", "database", "NoSQL" ], "likes" : 20 }

同时更新多个document,找到by字段为tutorials point的documents,然后把他们的by字段改为tutorials point222,第三个参数中的multi:true即表示允许修改多个,不加这个则只会更新一个

> db.my_data.find()
{ "_id" : ObjectId("6333b19a3dc8391635c169b0"), "title" : "New name", "description" : "MongoDB is no SQL database", "by" : "tutorials point", "tags" : [ "mongodb", "database", "NoSQL" ], "likes" : 100 }
{ "_id" : ObjectId("6333b19a3dc8391635c169b1"), "title" : "NoSQL Database", "description" : "NoSQL database doesn't have tables", "by" : "tutorials point", "tags" : [ "mongodb", "database", "NoSQL" ], "likes" : 20 }
> db.my_data.update(
...  {by:'tutorials point'},
...  {$set:{by:'tutorials point222'}},
...  {multi:true}
... )
WriteResult({ "nMatched" : 2, "nUpserted" : 0, "nModified" : 2 })
> db.my_data.find()
{ "_id" : ObjectId("6333b19a3dc8391635c169b0"), "title" : "New name", "description" : "MongoDB is no SQL database", "by" : "tutorials point222", "tags" : [ "mongodb", "database", "NoSQL" ], "likes" : 100 }
{ "_id" : ObjectId("6333b19a3dc8391635c169b1"), "title" : "NoSQL Database", "description" : "NoSQL database doesn't have tables", "by" : "tutorials point222", "tags" : [ "mongodb", "database", "NoSQL" ], "likes" : 20 }

6、实战2

创建collection

db.createCollection("empDetails")

使用insertOne命令插入一个document并查看

> db.empDetails.insertOne({
...  First_Name: "Radhika",
...  Last_Name: "Sharma",
...  Date_Of_Birth:"1995-09-26",
...  e_mail: "radhika_sharma.123@gmail.com",
...  phone:"9848022338"
... })
{
        "acknowledged" : true,
        "insertedId" : ObjectId("6333b4013dc8391635c169b2")
}
> db.empDetails.find()
{ "_id" : ObjectId("6333b4013dc8391635c169b2"), "First_Name" : "Radhika", "Last_Name" : "Sharma", "Date_Of_Birth" : "1995-09-26", "e_mail" : "radhika_sharma.123@gmail.com", "phone" : "9848022338" }

使用insertMany同时插入多个,当然按照前面直接用insert命令插入多个也可以

> db.empDetails.insertMany([
...  {
...  First_Name: "Doe",
...  Last_Name: "Joe",
...  Date_Of_Birth:"1995-10-24",
...  e_mail: "doe_joe.123@gmail.com",
...  phone:"1234567890"
...  },
...  {
...  First_Name: "Rachel",
...  Last_Name: "Christopher",
...  Date_Of_Birth:"1990-02-16",
...  e_mail: "Rachel_Christopher.123@gmail.com",
...  phone:"9000054321"
...  },
...  {
...  First_Name: "Fathima",
...  Last_Name: "Sheik",
...  Date_Of_Birth:"1990-02-16",
...  e_mail: "Fathima_Sheik.123@gmail.com",
...  phone:"9000054321"
...  }
... ])
{
        "acknowledged" : true,
        "insertedIds" : [
                ObjectId("6333b44d3dc8391635c169b3"),
                ObjectId("6333b44d3dc8391635c169b4"),
                ObjectId("6333b44d3dc8391635c169b5")
        ]
}
> db.empDetails.find()
{ "_id" : ObjectId("6333b4013dc8391635c169b2"), "First_Name" : "Radhika", "Last_Name" : "Sharma", "Date_Of_Birth" : "1995-09-26", "e_mail" : "radhika_sharma.123@gmail.com", "phone" : "9848022338" }
{ "_id" : ObjectId("6333b44d3dc8391635c169b3"), "First_Name" : "Doe", "Last_Name" : "Joe", "Date_Of_Birth" : "1995-10-24", "e_mail" : "doe_joe.123@gmail.com", "phone" : "1234567890" }
{ "_id" : ObjectId("6333b44d3dc8391635c169b4"), "First_Name" : "Rachel", "Last_Name" : "Christopher", "Date_Of_Birth" : "1990-02-16", "e_mail" : "Rachel_Christopher.123@gmail.com", "phone" : "9000054321" }
{ "_id" : ObjectId("6333b44d3dc8391635c169b5"), "First_Name" : "Fathima", "Last_Name" : "Sheik", "Date_Of_Birth" : "1990-02-16", "e_mail" : "Fathima_Sheik.123@gmail.com", "phone" : "9000054321" }

查看一个document,具体查看的是哪个则根据排序而定,通常是输出最开始插入的那一个,因为第一个插入的_id最小

> db.empDetails.findOne()
{
        "_id" : ObjectId("6333b4013dc8391635c169b2"),
        "First_Name" : "Radhika",
        "Last_Name" : "Sharma",
        "Date_Of_Birth" : "1995-09-26",
        "e_mail" : "radhika_sharma.123@gmail.com",
        "phone" : "9848022338"
}

赋值,并输出其_id

> employee = db.empDetails.findOne()
{
        "_id" : ObjectId("6333b4013dc8391635c169b2"),
        "First_Name" : "Radhika",
        "Last_Name" : "Sharma",
        "Date_Of_Birth" : "1995-09-26",
        "e_mail" : "radhika_sharma.123@gmail.com",
        "phone" : "9848022338"
}
> employee._id
ObjectId("6333b4013dc8391635c169b2")

7、_id

MongoDB中一个很重要的东西就是_id,由数据库管理系统自动生成,其类型是ObjectId,长度为12-byte,ObjectId("6333b401 3dc8391635 c169b2")中的一个数字是十六进制,用4个bits来表示,而一个byte是8个bits,因此12-byte则可以表示24个十六进制的数

前4个byte,代表这个ObjectId相对于Unix epoch的创建时间,用秒表示;Unix epoch 是UTC世界标准时间1970年一月一日0时0分0秒

中间5个byte,是随机生成的,这串数在此机器和进程上是唯一的

后3个byte,是递增的数,初始化为一个随机生成的数,而非从0开始计

8、实战:排序

直接插入多个document,collection被自动创建,根据borough字段按字母排序,1表示递增排序

> db.restaurants.insertMany( [
...  { "_id" : 1, "name" : "Central Park Cafe", "borough" : "Manhattan"},
...  { "_id" : 2, "name" : "Rock A Feller Bar and Grill", "borough" : "Queens"},
...  { "_id" : 3, "name" : "Empire State Pub", "borough" : "Brooklyn"},
...  { "_id" : 4, "name" : "Stan's Pizzaria", "borough" : "Manhattan"},
...  { "_id" : 5, "name" : "Jane's Deli", "borough" : "Brooklyn"},
... ] );
{ "acknowledged" : true, "insertedIds" : [ 1, 2, 3, 4, 5 ] }
> db.restaurants.find().sort( { "borough": 1 } )
{ "_id" : 3, "name" : "Empire State Pub", "borough" : "Brooklyn" }
{ "_id" : 5, "name" : "Jane's Deli", "borough" : "Brooklyn" }
{ "_id" : 1, "name" : "Central Park Cafe", "borough" : "Manhattan" }
{ "_id" : 4, "name" : "Stan's Pizzaria", "borough" : "Manhattan" }
{ "_id" : 2, "name" : "Rock A Feller Bar and Grill", "borough" : "Queens" }

按递减排序

> db.orders.insertMany( [
...  { "_id" : 1, "item" : { "category" : "cake", "type" : "chiffon" }, "amount" :
... 10 },
...  { "_id" : 2, "item" : { "category" : "cookies", "type" : "chocolate chip" },
... "amount" : 50 },
...  { "_id" : 3, "item" : { "category" : "cookies", "type" : "chocolate chip" },
... "amount" : 15 },
...  { "_id" : 4, "item" : { "category" : "cake", "type" : "lemon" }, "amount" :
... 30 },
...  { "_id" : 5, "item" : { "category" : "cake", "type" : "carrot" }, "amount" :
... 20 },
...  { "_id" : 6, "item" : { "category" : "brownies", "type" : "blondie" },
... "amount" : 10 },
... ] );
{ "acknowledged" : true, "insertedIds" : [ 1, 2, 3, 4, 5, 6 ] }
> db.orders.find().sort( { amount: -1 } )
{ "_id" : 2, "item" : { "category" : "cookies", "type" : "chocolate chip" }, "amount" : 50 }
{ "_id" : 4, "item" : { "category" : "cake", "type" : "lemon" }, "amount" : 30 }
{ "_id" : 5, "item" : { "category" : "cake", "type" : "carrot" }, "amount" : 20 }
{ "_id" : 3, "item" : { "category" : "cookies", "type" : "chocolate chip" }, "amount" : 15 }
{ "_id" : 1, "item" : { "category" : "cake", "type" : "chiffon" }, "amount" : 10 }
{ "_id" : 6, "item" : { "category" : "brownies", "type" : "blondie" }, "amount" : 10 }

9、数据类型

如 date, numbers, array, objects, string, boolean

插入数据,按ISODate进行查找,找出小于($lt)一个特定ISO时间的documents

> db.cakeSales.insertMany( [
...  { _id: 0, type: "chocolate", orderDate: new ISODate("2020-05-18T14:10:30Z") },
...  { _id: 1, type: "strawberry", orderDate: new ISODate("2021-03-20T11:30:05Z") },
...  { _id: 2, type: "vanilla", orderDate: new ISODate("2021-01-15T06:31:15Z") }
... ] )
{ "acknowledged" : true, "insertedIds" : [ 0, 1, 2 ] }
> db.cakeSales.find( { orderDate: { $lt: ISODate("2021-02-25T10:03:46.000Z") } } )
{ "_id" : 0, "type" : "chocolate", "orderDate" : ISODate("2020-05-18T14:10:30Z") }
{ "_id" : 2, "type" : "vanilla", "orderDate" : ISODate("2021-01-15T06:31:15Z") }

插入timestamp,由于三个同时插入,因此Timestamp中第二个interval递增

> db.flights.insertMany(
...  [
...  { arrival: "true", ts: Timestamp() },
...  { arrival: "true", ts: Timestamp() },
...  { arrival: "true", ts: Timestamp() }
...  ]
... )
{
        "acknowledged" : true,
        "insertedIds" : [
                ObjectId("6333b9963dc8391635c169b6"),
                ObjectId("6333b9963dc8391635c169b7"),
                ObjectId("6333b9963dc8391635c169b8")
        ]
}
> db.flights.find({})
{ "_id" : ObjectId("6333b9963dc8391635c169b6"), "arrival" : "true", "ts" : Timestamp(1664334230, 1) }
{ "_id" : ObjectId("6333b9963dc8391635c169b7"), "arrival" : "true", "ts" : Timestamp(1664334230, 2) }
{ "_id" : ObjectId("6333b9963dc8391635c169b8"), "arrival" : "true", "ts" : Timestamp(1664334230, 3) }

10、创建admin用户(待研究)

使用admin身份

use admin

创建user并给予admin权限

db.createUser(
 {
 user: "myUserAdmin",
 pwd: "123", // or passwordPrompt()
 roles: [
 { role: "userAdminAnyDatabase", db: "admin" },
 { role: "readWriteAnyDatabase", db: "admin" }
 ]
 }
)

查看user

> show users
{
        "_id" : "admin.myUserAdmin",
        "userId" : UUID("6052ae4e-999f-490f-93b7-db8e3a3717d8"),
        "user" : "myUserAdmin",
        "db" : "admin",
        "roles" : [
                {
                        "role" : "readWriteAnyDatabase",
                        "db" : "admin"
                },
                {
                        "role" : "userAdminAnyDatabase",
                        "db" : "admin"
                }
        ],
        "mechanisms" : [
                "SCRAM-SHA-1",
                "SCRAM-SHA-256"
        ]
}

查看可以赋予的role有哪些

> show roles
{
        "role" : "enableSharding",
        "db" : "admin",
        "isBuiltin" : true,
        "roles" : [ ],
        "inheritedRoles" : [ ]
}
{
        "role" : "hostManager",
        "db" : "admin",
        "isBuiltin" : true,
        "roles" : [ ],
        "inheritedRoles" : [ ]
}
。。。还有一串,这里忽略

退出mongo,使用刚创建的user身份登录进入

mongo --authenticationDatabase "admin" -u "myUserAdmin" -p "123"

给这个user以admin数据库的权限

use admin
db.auth("myUserAdmin", "123")

11、创建database用户(待研究)

转到test数据库

use test

创建user

db.createUser(
 {
 user: "myTester",
 pwd: "test123", // or passwordPrompt()
 roles: [ { role: "readWrite", db: "test" },
 { role: "read", db: "reporting" } ]
 }
)

退出mongo,用此用户身份登录

mongo --authenticationDatabase "test" -u "myTester" -p "test123"

在test数据库中创建名为test的collection并插入

db.test.insertOne( { x: 1, y: 1 } )