Taking Steps

School is for making relations


Today's DB Objectives:


Lesson 1: Relations for Databases

So far we have a notion of databases as a bunch of tables which you can see as collections, with their rows seen as objects, and their columns seen as attributes. That is object-oriented thinking, and it works just fine. But in math we have a better model of they actually work.

The relational model. You might remember, from some distant math class, the notion of a binary relation. It was a pretty generic way of relating two things, and we do it daily: \( (a = b, y = f(x), n \leq m) \) are all examples of binary relations. A binary relation is a set of ordered pairs, like \(\{(0,0), (1,1), (2,0), (0,2), (-1, 1), (1,-1), \cdots \}\) is the set of pairs which define equality modulo 3.

Formally a relation is any subset of \(S_1 \times \cdots S_k\) for any sets \(S_i\). So vectors are relations, tuples are relations, and all of our database tables are relations. Our rows are instances of the relations defined by the table. In databases our sets are limited to "atomic" sets, i.e. things which are pretty simple as data. For instance: values in SQLite databases are limited to Integers, Reals, Text, Blobs (a blob of value stored as given), and the Null value.

Let's see this in action:


CREATE TABLE relation (v1 integer, v2 integer, v3 integer);
INSERT INTO relation VALUES (0, 1, 2)
INSERT INTO relation VALUES (3, 4, 5)
INSERT INTO relation VALUES (6, 7, 8)
INSERT INTO relation VALUES (9, 10, 11)

Mathematically that is saying, let \(\mathcal{R} := \{(0,1,2), (3,4,5), (6,7,8), (9,10,11) \}\). Now you can see most SQL operations as (multi-)set operations and vice-versa. As a quick for instance: \(\mathcal{R} \times \mathcal{R}\) is the cartesian product of the set \(\mathcal{R}\) and itself. It has 16 elements, one for each pair of triples in the original set. We can get that relation in SQL by issuing the command: SELECT * FROM relation r1, relation r2;.

There are a set of common relational operations/functions that transform relations into different relations:

These each of a natural SQL counterpart. \(sigma\) is the WHERE clause, \(Pi\) is the SELECT clause, \( \times, \bowtie\) are the FROM clause (just comma separate or NATURAL JOIN), while \( \cup , \cap , -\) are the select combiners UNION, INTERSECT, and EXCEPT.

This also forms a new mental model of how results are selected by SQL statements. Every SQL statement creates a new relation, which acts like a table. So when you get your results back it is always as an array of rows, so to speak. Another observation is that SELECT results are created (logically, not implementation-wise) in the following order:

  1. Use the FROM statement to create a master relation
  2. Use the WHERE statement to toss out elements from that relation
  3. Use the SELECT statement to decide how to display those elements

To practice this new viewpoint: fire up a DB (maybe chinook, maybe scrabble, maybe your posts). Now imagine each table as a set of tuples. Combine the tuples of two small sets using \(\times\) (select * from table1, table2;) and using \(\bowtie\) (select * from table1 natural join table2;). How are the results different? If you have an error you might need to give the tables aliases table1 t1. Now try UNION or INTERSECT using two different WHERE clauses and try to predict your outcome.

Lesson 2: SELECT 2.0

Now that we can see tables and select queries as relations, let's really explore the cool tricks we can pull off and then start a scavenger hunt.

The set-up, our book uses a university database which you can grab from db-book.com. Grab the DDL file which defines the tables, and then grab the file for creating small relations which populates those tables. You can create a db and run the commands in that db with the following command-line pattern:

sqlite3 the.db < code.sql 
This "pipes" in the file as a set of commands that gets executed in the new db.

So create a db, or download my version and access it with an sqlite terminal (if you have one locally then great, otherwise head up to cloud9). For each task write an SQL statement that solves the question.

First task: find the names of all students that have at least 1 computer science class.

Second task: find name and id of all students that have never taken a class earlier than Fall 2009.

Third task: find the department which has the lowest maximum salary of all departments.

To help you get your gears turning here are some sample queries and what they accomplish:

All instructors that are paid less than the lowest paid computer science professor.


select name from instructor where salary < (
   select min(salary) from instructor where dept_name = 'Comp. Sci.');

All courses taught in both Fall 2009 and in Spring 2010:


select course_id from section as S where semester='Fall' and 
    year=2009 and exists (select * 
       from section as T where 
           semester = 'Spring' and year=2010 and S.course_id = T.course_id);

Which department pays out the most in total salary, and how much?


select dept_name, max(tot_salary) from (
    select dept_name, sum(salary) as tot_salary 
        from instructor group by dept_name) dept_total;

Lessons 3-5: MySQL 101

Step 1: Install MySQL and PHPMyAdmin

I want you to create an instance of MySQL on a cloud9 environment. Create a table, insert a row. I'm up past my bedtime so I'm going to give you docs and a promise of help in class:


        mysql-ctl install
        mysql-ctl cli
        phpmyadmin-ctl install
        

Cloud9 docs.

More simple c9 docs.

Using PHPMyAdmin

Ask me to show you my antiques DB.

Step 2: Create the university database in MySQL

OK, so getting the same University DB to work in MySQL took a little bit of massaging:

In general MySQL is a process which is served on port 3306, you connect to it as a user (with a password). The process will be managing many databases. The general connection command is like this:


        mysql -h localhost -u myname -p mydb
        

On cloud9 I had a bizarre socket problem which was fixed by connecting using a fully specified localhost and port:


        mysql -h 127.0.0.1 -P 3306 -u root -p
        

(Full connection documentation here)

Step: Create a Database
Login and try these:

Show databases; to see the current databases.

Create database dbnamehere; to create a new database.

use dbnamehere; to switch to that database directly.

Create a User

CREATE USER 'testuser'@'localhost' IDENTIFIED BY 'test';

GRANT ALL PRIVILEGES ON *.* TO 'testuser'@'localhost';

FLUSH PRIVILEGES;

Create University Tables

I had to remove the foreign key constraints from the above DDL.sql, here is today's MySQL friendly DDL

Run it like this:

mysql -h 127.0.0.1 -P 3306 -u testuser -p university < DDL.sql 

Populate University Tables

Populate it like this:

mysql -h 127.0.0.1 -P 3306 -u testuser -p university < smallRelationsInsertFile.sql 

NEXT TASK: now create a query which calculates each student's GPA.

Comparing MySQL to SQLite

What makes MySQL different than SQLite?

A great starter guide to MySQL, note the differences

Next task: Please sign up for an ECE/CIS account.

Connect to your mysql database using PHP and NodeJS? I don't have time to give sample code, let's try it together if we have time in class.

Last Topic: Intro to the next project

I want you to build a six degrees (of Kevin Bacon) engine. Use IMDB's data, an ECE workspace, MySQL, and some ingenuity. I require that you find connections between two actors/actresses respecting some restraints: the app should allow specifying acceptable decades of movies, a list of possible directors, a list of possible screen writers, and/or popularity of the stars.

Collaboration and project expectations

I've created a Piazza site, if you did not get an email to join it then let me know and I'll get you in there. You can ask questions through that and other students can answer too.

Here are some guidelines for making a GitHub repo out of any folder, which might help with sharing code or saving cloud9 code.

Here are my project submission guidelines, follow them for all projects including the chinook data project.