MongoDB++


Today's DB Objectives:


Aggregation

For this lesson I want to work with the list of all zipcodes in the US. Do the following:

Mongo offers three different types of aggregation:

Open up mongo to the codes database: mongo codes (--port blah). For the simple commands try this: db.zips.find({state: "DE"}).count() and db.zips.distinct("state").

MiniTask: How many zipcodes have more than 100,000 people?

Aggregation pipeline. Mongo aggregation works in a set of stages. You can $match, $group, $sort, $limit, and even $project in any order you want.

Let's look at $group, which I think is important. We can specify a field that will group the documents using _id and a set of accumulators that will create new fields for those groups. For instance: db.zips.aggregate([ {$group : { _id: "$state", population: {$sum : "$pop" } }} ]) will create an array of documents, one per state, with _id and population fields. Notice that aggregate takes an array of "stages", we did a single grouping stage. If I wanted to also do some filtering I can use a $match stage.

db.zips.aggregate([ {$match: {pop: {$lt : 100}}}, {$group : { _id: "$state", population: {$sum : "$pop" } }} ]) this command first finds zipcodes with less than 100 people then adds those populations by state. So it is the extremely rural population of each state.

MiniTask: Use matching to find only the states that have more than 8 million citizens.

We can toss additional fields into the _id field and sort. Here is a large example for finding the largest and smallest city in every state:


db.zips.aggregate( [
{ $group:
{
_id: { state: "$state", city: "$city" },
pop: { $sum: "$pop" }
}
},
{ $sort: { pop: 1 } },
{ $group:
{
_id : "$_id.state",
biggestCity: { $last: "$_id.city" },
biggestPop: { $last: "$pop" },
smallestCity: { $first: "$_id.city" },
smallestPop: { $first: "$pop" }
}
},
{ $project:
{ _id: 0,
state: "$_id",
biggestCity: { name: "$biggestCity", pop: "$biggestPop" },
smallestCity: { name: "$smallestCity", pop: "$smallestPop" }
}
}
] )

MiniTask: can you find the state with the largest average city size? smallest average city size?

For more details on mongo aggregation checkout this pdf.

Indexing

So there is always a default index on _id but we can make secondary indices to help speed up lookups. In an internet sized application you cannot afford to do a full lookup. So let's do some performance checking on the zips file:

Voila you've made a mongo secondary index.

Mini Task: allow searching by city to go quickly with an index.

Using NodeJS and MongoDB

Master this and you'll be the coolest kid at the party! Let's connect to a server running mongod. If you used cloud9 before and didn't shut down mongod the correct way you might need to run the command rm /data/db/mongod.lock and then mongod --smallfiles --fork --syslog.

OK, so Mongo will work with many programming languages (check out the list of languages with drivers), but it runs javascript so it feels most natural to use it with NodeJS. Many of you used PHP for your HTTP listening so far, and I wanted to encourage you to try out Node before you left my course. For Node there are many interfaces, and honestly the one closest to your current skill level will be mongojs, but the one which is most supported and likely to pop up in your future is mongoose. From the server you are on run npm install mongoose. I created a sample Mongoose program that does the four basic CRUD operations. Let's play with that now:

The reason Mongoose has gained favor is that it allows some sort of Schema to be defined and utilized for a type of JOIN. It also has as close to the same flavor as the mongo shell as can be in a Node application.

I'm worried about the amount of time I have on these notes so I will walk you through the install steps and meaning in person during class.

Task A: Alter this to have a new schema and a new database. Let's say Robots in a database called "future". Give Robots a battery life, model, and a method for doing dishes.

Task B: Refactor this code to create new robots and save your robots.

Task C: Find the first robot and give it a laser.

A Reddit Clone

Next I want to show you a pretty functional MEAN stack built on this tech. The github code is here and it should run on your cloud9 environment with a quick npm insall. Again we will talk this out in class and make some alterations.

Sample reddit

Task 1: get this running.

Task 2: create some posts in the database directly then refresh your page.

Task 3: create some comments in the database directly.

Task 4: Talk out how you would keep someone from upvoting more than once.

The JOIN problem

The biggest complaint about noSQL is living without JOINs. That is because we want to think in terms of relationships. In particular many to 1, 1 to many, 1 to 1, and many to many. In Mongo we can solve this problem in several ways.

Your Reddit clone used mongoose to populate the comments for a post.

The other options are: embed comments in a post, have comments reference their posts, or have posts reference their comments. I could also make a strong case for doing both. Be aware that the default _id in mongo has a special type: ObjectId so if you are going to reference an _id by string use ObjectId(stringhere).

Let's try to practice building with all three.

In the Mongo shell or as a mongoose program lets make a blogging engine.

Task A1: Make a blog collection where each document contains an array of comments. Find one blog post and all of its comments, do the same for each of the other tasks.

Task A2: Make another blog collection and comment collection where each comment references a blog post.

Task A3: Do it the other way, each blog references an array of comments.

Sharding?

To deal with scaling we can split the database up into many pieces. Our classroom architecture doesn't quite support our need for 6-9 separate servers. But if you have signed up for the GitHub Education pack then you can get a good bit of free digital ocean usage. Digital ocean lets you spawn multiple virtual machines at will. In that case follow these pleasant tutorials on sharding and replication sets.

Connect from PHP

If you want to try mongo from PHP you'll need to install the driver. After that the syntax is pleasant: Here is the SQL to Mongo chart for PHP.