MongoDB Logistics


Today's DB Objectives:


Installing Mongo

Ok we'll start by installing mongod and mongodb on our ECE servers. I'll be following the steps at mongodb's installation for generic linux systems.

  1. ssh into mlb1
  2. (bash)
  3. curl -O https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-3.0.4.tgz (this downloads mongo)
  4. tar -zxvf mongodb-linux-x86_64-3.0.4.tgz (unzips it)
  5. mv mongodb-linux-x86_64-3.0.4 mongodb (gives it a nicer folder name)
  6. export PATH=$HOME/mongodb/bin:$PATH (puts all of the cool stuff on your path so you can run mongo, mongod from the command line)
  7. mkdir -p mdata/db (create a directory where we will store our data)
  8. chmod +rwx mdata/db (make sure we have permissions to write and read that directory)
  9. mkdir logmon (create a directory for the log files)
  10. Lookup one of your port numbers at our workflow page
  11. mongod --dbpath mdata/db/ --port yourporthere --fork --logpath $HOME/logmon/mongodb.log (this runs the mongo daemon starting a listener at the port we specify, forking the process so it should stay running in the background, and telling the mongo listener where our data will live and where our log files will live)

There are many interesting options to run when you start the mongo listener check them out to set your imagination afire as you progress.

Now to interact with the database type mongo --port yourporthere and you'll be connected.

Mongo Labs Solution

I also recommend going to MongoLab and creating some free mongo databases (always select the sandbox/free choices). For a database add at least one user. They will give you two methods of connecting (one for inside your programs one from a shell).

Cloud9 solution

Cloud9 comes with mongod and mongo on the path and the default locations of data all setup for users. The only downside is that the memory restrictions are so tight that you need to force small files. So to use mongodb on cloud9 do the following steps:

  1. mongod --smallfiles --fork --syslog
  2. mongo

Shutting Down the Process

When connected via mongo you can type: use admin then db.shutdownServer().

You can also type mongod --shutdown

You could also lookup the process ID and do kill PID

If you wanted to you could have not forked the mongod service and instead ran it from inside of a screen (a very cool trick if you've never used it). In that case reattach the screen and CNTRL-C, although if you did that you probably don't need the guidance.

High Level Forces in Mongo and post-SQL databases

So Mongo is our first no-SQL database. It is gaining traction in the world at the moment as a very scalable solution for a wide range of projects. There will be some shellshock as we move on. Mostly because you cannot do joins. Also there are no transactions. So mongo is made to have high availability, multi-locality architectures, eventually consistent data. The big difference here is that we give up consistency to gain the ability to partition the database.

Please don't get the idea that MongoDB is going to be the database to beat for the next 20 years. The reality is that the relational systems which emerged from the last century just don't hold up to the low-latency high-volume demands of the modern internet giant (without huge hardware demands). (For comparison here is some high level concepts about Manhattan Twitter's solution to this problem.) I just don't think that you should ignore the No-SQL movement because the winner isn't yet clear. So Mongo (and Firebase) will act as our entry-point into no-SQL databases. The current trend is using cheap machines that are massively redundant (the cloud) rather than dedicated mainframes.

You can get the official word on what I show you here by visiting Mongo's CRUD documentation.

If you are interested, they have internships for you in NYC 2016.

One more warning: I am not a world-class mongo expert. I have launched only one small production product using mongo (Seaford Bowling Lanes), and I've done 2-3 toy projects in mongo. I have yet to push a mongo project past the basic levels of comfort. So my perspective does not contain practical sharding opinions or seasoned troubleshooter insights.

Basic Structures in MongoDB

OK so just like with MySQL you can login to a particular database. For instance mongo mydb will create or use a database names mydb. From the shell I can switch databases just like MySQL: use otherdb to switch to otherdb. I can see all dbs with the command show dbs.

Each database contains collections which each contain documents. For now think of a collection as a table and a document as a row. This analogy breaks down in the following ways:

So a database is a gathering of collections which are themselves gathering of documents. Mongo calls their JSON objects BSON objects which is for binary json, and a little bit of a shout-out to B+Trees.

Gettin CRUDy with Mongo

Let's get cranking with Mongo. Here is a student.json file, and here is a takes.json file. Fire up one of the environments where you have mongo running and do the following steps:

That created a uni database with a student collection and started the mongo shell with the uni database active. Then from within the shell it accessed the active "db" collection "student" and ran the method "find()".

By the way the Mongo shell allows tab completion, up/down arrow command navigation, and the execution of generic javascript! One of the appeals of the MEAN stack (Mongo, Express, Angular, Node) is that developers can learn one language, javascript, and use it at every layer of the modern app stack. So our interactions with the database will feel a bit like object oriented programming / ORM / Active Record coding already.

READ: query a collection (Part 1)

So db.collection.find() is the method that will query your collections (findOne exists and a ton of others if you try the old db.collection.[tab] trick).

The input to find is a JSON object which is used as a filter. So for instance to find the student with ID "23121" we would do db.student.find({ID: "23121"}) (note that the string vs integer mattered here). If you want to specify multiple conditions just toss more attributes into the JSON object. For instance if you import the takes.json file like you did student.json then you could do db.takes.find({course_id: "CS-319", sec_id:"2"}) to get the records from section 2.

Mini-Task Get all documents from takes from Spring 2010.

This simple filtering acts like our WHERE clause. To specify which fields (columns) you return add a second JSON object which will act as the "projection" operation. The second JSON object should have values of true/false or 1/0. If you set the value of a column to 1 then it will return that field, if you set values to 0 it will return everything but those. The one exception is _id, which is included by default and you can explicitly supress it.

Examples:


db.takes.find({course_id: "CS-319", sec_id:"2"}, {ID: 1, grade: 1})
{ "_id" : ObjectId("559b48b5be479e09bda3e7d8"), "ID" : "76543", "grade" : "A" }


db.takes.find({course_id: "CS-319", sec_id:"2"}, {_id: 0, semester: 0})
{ "ID" : "76543", "course_id" : "CS-319", "sec_id" : "2", "year" : 2010, "grade" : "A" }

Mini-Task 2 What happens when I include "ID" and exclude "grade"? Include "ID" and exclude "_id"? Can you make a query that gets only the name and total credits of all Physics majors?

Mongo has more sophisticated querying of course, they have operations which are prefixed by $ that do cool things. For instance, $in, $nin, $gt, $gte, $ne, $eq, $or, $not, $regex. Let's select all stuents with more than 100 credit hours: db.students.find({tot_cred: {$gt: 100}}). If we wanted to further filter to students with names not ending in a vowel we could do: db.student.find({tot_cred : {$gt: 100}, name: {$regex: ".*[^aeiou]$"}}).

Mini-Task 3 Find all students that have taken a 101 level class and got an A in it.

Some documentation that you'll find useful:

Updating Documents

If you think about how we updated in SQL you'll remember that we needed some sort of filtering clause and some sort of action to do. Mongo is similar. The syntax is: db.collection.update(queryPart, updatePart, optionsPart). The queryPart acts just like in the find method. The updatePart is more interesting. If you provide a JSON object it will just replace the record with what you provide. But like in the find case we have special operators that allow cooler behavior.

The most important operator is $set, which allows you to only adjust the given values. For instance, to change Zhang's department from Comp. Sci to Biology we would do: db.student.update({name: "Zhang"}, {$set: {dept_name: "Biology"}}) do so now and observe the output.

The default update is to change only the first found record, if you want to change all records then in the optionsPart set {multi: true}. So to add a new field, status, to every student we could do: db.student.update({}, {$set: {status: "banana"}}, {multi: 1}}. Now all students have "banana" as their status.

Task A: increment "Aoi"'s total credits by 3.

Task B: Replace the student record for "Snow" to {"name":"Snow", "knows":"nothing"} keep the old _id.

Task C: figure out how to use updating to set a timestamp in a field. Add the field "graduated" to all students with more than 90 total credits and set the value of "graduated" to a current timestamp.

Helpful Documentation:

Destroy: Removing Documents

This is pretty simple since we really just need to select some elements and drop them. So all of the same conditions that worked for db.collection.find() will work for db.collection.remove(). The only difference is that if you want to delete everything you need to specify an empty JSON object: db.student.remove({}).

TASK Z: Delete Williams from the collections: takes and student

Creating Documents

This is also as simple as db.collection.insert({stuff:"here"}). If the collection didn't exist before then this operation will create the collection. You can also do db.createCollection("collectionName") to make an empty collection. The input is BSON/JSON.

Task CC: Create an unusual record in student.

Task NN: Create a new collection with a new document containing an array.

Extra Practice

If we made it this far then let us play with some of these examples.

Next level topics

We haven't played with the reality of nested JSON objects and arrays as values. This means that we can have collections inside of collections and so on. Mongo encourages this, there are special operators for arrays and even "the.dot.notation" to reference attributes of attributes.

Many clauses can be chained. limit, sort, and explain are all possible.

Distribution of databases via Sharding, Masters, and Slaves.

Accessing Mongo via Programs

I encourage you to use mongoose as a lightweight interface to mongo from NodeJS.