Dirty Users Dirty Data Dirty Work


Today's DB Objectives:


Lesson 0: An angular front-end

I built a quick front-end for your chinook customers. It doesn't have everything but it gives you an idea of how your front-end team will lean on your DB work. Here is a zip of the HTML/JS code. And here is a stub of the code working in plunker.


Lesson 1: Our ECE workflow

This is so important I'm going to not put it here, but rather on the front page of the website.

By the way if you are struggling with PHP then checkout PHPFiddle, if you are struggling with Node checkout Node-Console.

I will walk you through a complete setup of a JSON-out Node and MySQL workflow:

  1. SSH into mlb1.acad as in the workflow page
  2. bash
  3. mysql -h orioles1 -u USERNAME -p USERNAME
  4. create table bananas (name varchar(20), ripeness int);
  5. insert into bananas values ("Great Banana", 7);
  6. Exit MySQL
  7. git clone https://github.com/cisc437/node-api-boilerplate.git
  8. cd node-api-boilerplate
  9. vi server.js
  10. Change USERNAME and PORT
  11. ZZ (save and quit)
  12. npm install
  13. npm install forever
  14. ./node_modules/forever/bin/forever start server.js
  15. Visit http://mlb1.acad.ece.udel.edu:PORT/ or curl localhost:PORT from CLI
  16. ./node_modules/forever/bin/forever stop server.js

Get this running now to make sure you have access to MySQL, a bigger more stable host than cloud9, and can serve content.

A useful tutorial with trick for tossing a large load at your server and pooling mysql connections.


Quick SQL quiz

Christopher was kind enough to collate some SQL statements from our book using the University database we created. Look at each of these queries and decide what you think it does:


A Sample Project Specification Session

One of the toughest part of any project is delivering the right thing. Often you need to wireframe early to help the end-user imagine themselves using the new product before they realize what they'll need. The Kevin Bacon project is intentionally unspecified, much like a non-technical boss might request of you. I want to practice specification collection and the beginnings of database design with a sample project today.

Social Nutritional Recipe Analyzer

Wouldn't it be cool to let users create recipes and spit out the number of calories and miligrams of sodium? Then they could share the recipes with each other!

Let's start to build it

Gather with a partner and discuss what sort of data you'll need to make this product sing. Together come up with questions to ask me, the stakeholder, and perhaps even generate user-stories.

Imagine which tables will be needed to pull this off, and if you add a feature what tables will be added, if you change a feature how badly might your tables break?

Note that recipes and ingredients have a many-to-many relationship, that is, each ingredient might be in many recipes and each recipe might have many ingredients. How do you want to solve this? Can you do it in such a way that there is little data redundancy in the table?

Feel free to study some websites with nutritional data to get a feel for what is feasible.


Web-Scraping Dirty Data

When we build data-driven apps we often need to gather data from third-party sources. Today we'll do some data-pirating. Some sites explicitly forbid this so beware.

For our project I want to scrape CalorieCount.com. Let's begin by looking at the page layouts for a couple of ingredients and a couple of recipes. Inspect elements, source, network requests, etc. Can we get lucky and find a public facing API?

Now I want to show you my pythonic way of scraping sites. I like to do it python because the syntax is cleanest and I can putz around in the terminal while figuring out my structure before having to write the real program.

I prefer to use iPython for fast poking around (Anaconda has ipython and most of the great packages pre-installed). Also I like to get the concepts working before going into a harsh environment (our SSH terminals). So fire up a python/ipython terminal somewhere, maybe on an ece machine.

I'll code along with you on this but here are the commands and modules we'll want to use:

OK at this point we have the entire blob of html from the bananas calories page. Also we have made some soup, whatever that means.

Task 1: Use your R&I (Resourcefulness and Initiative) to extract the number of calories in a serving of bananas.

Task 2: Find the pattern for recipes/ingredients and scrape links to related banana-friendly recipes.

Task 3: Take your process for calories and turn it into a function, then apply it to an ingredient in a different recipe.

MySQL and the Language of your choice

Earlier we showed you Node/MySQL. http://stackoverflow.com/a/5687856/1347629 will deal with python/mysql.

How about C++ to MySQL?

How about PHP to MySQL?

Start a web scraping job to populate your app tables

To put it all together for today's class, let's take our rough data designs from earlier, and marry them to the rough web scraping we're doing. Can we now begin to fill out a recipe/ingredient/nutrition database one http request at a time?