Michal Rjaško

Contact:

rjasko (zavinac) dcs.fmph.uniba.sk

Databases practicum, winter 2023/2024

Michal Rjaško, rjasko at dcs.fmph.uniba.sk
Ján Mazák, M255, mazak at dcs.fmph.uniba.sk

Homeworks

There will be three homeworks published here. The first homework will be published after 4th lesson.

Homework 1

assignment
test database
please send your elaboration till 10th november 2024 23:59:59 by email to rjasko (zavinac) dcs.fmph.uniba.sk

Homework 2

assignment
send you elaboration till 22nd december 2024

Homework 3

assignment
send you elaboration till 10th january 2025

evaluation of exercises

The last lesson on the 16th december 2024 is cancelled

We wish you happy holidays

Lesson 11

NoSQL

presentation
MongoDB:
- Online environment: https://onecompiler.com/mongodb/
- Documentation: https://www.mongodb.com/docs/manual/crud/
Tasks:
- Create a database (collection)
  - students - we record: name, surname, date of birth, address, what kind of disability the student has, names and addresses of legal guardians, relationship of legal guardian to student (father, mother, other)
  - classes - we record: name, classroom number, grade
- List students who are at least 10 years old
- List students who have some kind of disability and have only one legal guardian
- Try the update operation: set the end of compulsory school attendance for students aged 16.
- For classes with grades 1-5, set the field "Primary school 1st grade" and also save the time of the last change of the record during the update operation
- Delete students without parents

Java (continued) - Transactions and Isolation

See the documentation:

BEGIN, COMMIT, ABORT / ROLLBACK
JDBC Transactions: conn.setAutoCommit(false); conn.commit(); conn.rollback();
Transaction isolation
Transaction isolation in JDBC

In your JAVA application, create two connections to the database (i.e. you have two objects of type Connection - Connection c1; Connection c2)
Create a table population_changes(country text, year int, population_in int, population_out int) - population_in means how many new people have been added to the given country (births + migration) and population_out means how many people died (death + migration).
Let's try out how parallel transactions work. Create two JAVA functions:
1. covidPandemic(Connection c, String country): the function prints the name and current population of the given country to the console, reduces the population by 0.1%, writes the change to the table population_changes and then prints the new population value to the console (using a SELECT query). Do not call setAutoCommit(false) or commit() in the function - we will call them in another part of the code
2. migrationCrisis(Connection c, String srcCountry, String dstCountry): the function prints the name and current population of both countries to the console, decreases the population of srcCountry by 1%, and increases the population of the target country by the given value. It writes the changes to the table population_changes and then prints the new population values of both countries to the console (using a SELECT query). Do not call setAutoCommit(false) or commit() in the function - we will call them in another part of the code
In the main part of the program, then try to use the above functions so that they run in two overlapping transactions.
- e.g. you call c1.setAutoCommit(false);c2.setAutoCommit(false); covidPandemic(c1, 'Slovakia'); migrationCrisis(c2, 'Ukraine','Slovakia');c1.commit();c2.commit();- try it for multiple countries in a row
- observe how transactions behave - when changes made by one transaction are seen by another transaction
- Try also to use different methods of transaction isolation, e.g. BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE - in JAVA con.setTransactionIsolation(Connection.TRANSACTION_SERIALIZABLE);
Try using several threads to execute transactions in parallel - some threads will call the function covidPandemic(c1, country), others migrationCrisis(c2, srcCountry, dstCountry)
- Alternatively, you can also try incorporating "SELECT pg_sleep(random())" into the transactions to make the transactions last longer.
- Watch whether individual transactions "wait" for each other if they access the same rows (see also httpAs://www.citusdata.com/blog/2018/02/15/when-postgresql-blocks/)

Lesson 10

Java continued + Inserting large amounts of data

See the PostgreSQL COPY command: https://www.postgresql.org/docs/current/sql-copy.html
Download the CSV file containing all the cities data: https://simplemaps.com/data/world-cities
Create a JAVA application that initializes the cities table world_cities(city,city_ascii,lat,lng,country,iso2,iso3,admin_name,capital,population,id) in PSQL with the data from the above file

First try inserting the data using the COPY command

For the COPY command to work using file path, i.e. COPY world_cities FROM 'world_cities.csv', you would have to be logged in as the server administrator (root in Linux), otherwise PosgreSQL cannot read the file. This won't work
However, we can use CopyManager from Java: https://stackoverflow.com/questions/46988855/correct-way-to-use-copy-postgres-jdbc

Later, try inserting data via bulk INSERT (multiple rows, say 100 in one insert - or try how many rows you can get through, a description of multiple ways of inserting data can be found at https://www.postgresqltutorial.com/postgresql-jdbc/insert/, try e.g. PreparedStatement.addBatch()).
Create indexes in the table that allow you to search for a city by its name, country or latitude and longitude (e.g. via the BETWEEN operator).
Compare the speed of filling the table via COPY and INSERT. Try filling the table without indexes (they will be created only after all rows are inserted), compare the speed with when indexes are created before inserting data.

Create function / trigger

See the documentation:

Create a function that creates the table world_countries(name, iso2, iso3, population, lat, lng) from the table word_cities, where name will be country name, population will be the sum of the population of all cities in the country and lat,lng will be the midpoint of all city positions.
Create a trigger that, when the population of a city changes, will also adjust the population size of the given country (similarly when inserting or deleting a new city)
From your JAVA application, try changing the population of some random cities (or load data about the city and the new population from the console - beware of SQL injection) and check (or program a check in JAVA) whether the trigger calculated it correctly

Lesson 9

SQLite

SQLite is a database system that does not require a separate server --- it runs within the process of your application. If you want to use a relational database in your application and do not need a centralized repository (that would collect data from multiple applications), SQLite is usually a suitable solution. It is also widely used for testing, prototyping, and early development, even in cases where a different database system will be used in the production version of the software. It is also suitable for devices with limited computing or memory capacity (commonly used, for example, in Android applications).
presentation

Connecting to a database from the Java programming language environment

presentation
Create a new project. Download the SQLite JDBC driver and add it to your project.
See the tutorial on using SQLite in Java.
For this exercise, we will use the grading database that we created in the previous exercises.
Assignments:
1. Write a program that connects to the database and prints the names of the students to the console.
2. Fill the tables with random data. Create a set of several first names and several last names (e.g. 20-30 first names and 20-30 last names). Don't spend too much time coming up with names (use "AAA", "BBB",...)
  - Fill the tables so that you have about 100 teachers, 600 students, 20 subjects, and each student has about 200 grades (about 10-15 from each subject, i.e. about 120,000 grades in total).
  - Measure how long it takes to fill the grades table with data (print to the console how many milliseconds the operation took).
3. When populating tables, try different optimization methods and compare their speeds (see http://www.postgresql.org/docs/9.1/static/populate.html):
  1. First insert each row with a single insert without using a prepared statement
  2. First insert each row with a single insert using a prepared statement
  3. Combine several rows into one insert - we will probably not be able to INSERT all the rows you want to insert, due to restrictions on the maximum query size (about several MB). It is therefore advisable to combine from 100 to 1000 rows into one INSERT query. You can try as many as the system allows you to do
  4. Perform all INSERT queries in one transaction
  5. Drop all table indexes before inserting and create them after inserting
  6. Drop all table CONSTRAINTs before inserting and create them after inserting
4. Write a program that reads the student's First and Last Name and class from the console and prints their grades (subject name: grades from the subject separated by commas). If there are more students with the given name and class, print the first one. Use a prepared statement. Make sure that the name is case-insensitive.
5. Modify the program from the previous task so that the system finds the student if the user writes only part of his first / last name. If multiple students are found that match the search criteria, give the user the option to choose (e.g. by displaying a list and requiring the user to enter the student ID number of the student they are looking for).
Try connecting to a PostgreSQL database (via TCP/IP) instead of SQLite. If you don't have it running locally, you can use the cvika.dcs.fmph.uniba.sk server, but access needs to be set up (we'll show you how to use an SSH tunnel, i.e. port forwarding):
1. Connect via SSH to cvika.dcs.fmph.uniba.sk and run psql.
2. Set your password using
```
ALTER USER {your_login_name} WITH PASSWORD '{your_new_password}';
```
  The functionality of the password can be tested with the command
```
psql -h 127.0.0.1 test {your_login_name}
```
3. In order to connect to Postgres running on cvika.dcs.fmph.uniba.sk from a remote computer, we will create an SSH tunnel:
  - LINUX: ssh -L15432:127.0.0.1:5432 username@cvika.dcs.fmph.uniba.sk
  - WINDOWS: in Putty you need to set "local port forwarding" from port 15432 to 127.0.0.1:5432
4. Download the PostgreSQL JDBC driver and add it to the project.
5. Connect to the database according to the tutorial.
Test the query execution speed and compare it with the results for SQLite.

Lesson 8

Constraints

slides
We continue working with the tables created in the previous exercise.
Tasks:
1. Choose a primary key for each table.
2. Use UNIQUE to ensure that a class can only have a subject assigned only once (we avoid duplicate entries).
3. Disable NULL in columns where it is necessary to register a value (e.g. it is not necessary that the student has a registered gender, but he must have a name).
4. Use CHECK to limit the possible values for gender and date of birth (choose some meaningful range). Test if your constraint works for both INSERT and UPDATE.
5. Add foreign keys to the grades table: the values in the columns must refer to an existing student, teacher and subject. Verify functionality on INSERT where the subject reference is non-existent or NULL.
6. Add meaningful values for ON DELETE to all foreign keys: when deleting a student, records of his grades must be deleted; a teacher or subject cannot be deleted if there are any grade records related to it. Verify that your settings are working when you try to delete all teachers, all subjects, or individual students.
7. To all foreign keys, add meaningful values for ON UPDATE and set immediate evaluation with the possibility to change it within the transaction (DEFERRABLE INITIALLY IMMEDIATE).
8. Write a query that moves all biology grades for 1.A students from one teacher to another. Verify that the query works correctly.

Views

See VIEW documentation.
We continue working with the tables created in the previous exercise.
Create a VIEW that displays the grade point average for each student and subject assigned to their class. Records should be sorted by student name, then by course title, and finally by grade point average (all in ascending order, or ASC).
Verify that the created view is preserved even after logging out and logging in again.
Verify that the created view reflects changes in the underlying data.

Indexes and query planner (EXPLAIN / ANALYZE)

Despite the fact that queries are formulated in declarative languages (we cannot specify the calculation procedure), the relational database can be asked which procedure was chosen. This makes sense especially in cases where query computation is significantly slower than we would expect. But remember once and for all: first of all, we always strive for high speed of human understanding of query writing, only then we are interested in calculation speed (and even then usually only for queries, which are critical, e.g. are performed very often).
"Premature optimization is the root of all evil." (D. Knuth)
(But not to oversimplify, see also one or two other views.)
The PostgreSQL system, as well as some others, allow you to monitor the schedule and speed of query computation using the EXPLAIN and ANALYZE commands.
Plánovač dotazov (query planner) kladie primárny dôraz na diskové operácie. Podkladom pre jeho prácu je množstvo dát v jednotlivých tabuľkách a spôsob uloženia dát na disku. Databáza si tieto údaje uchováva v pomocných tabuľkách, ktoré nereflektujú okamžitý stav, ale len jeho aproximáciu (ak by sme ich updatovali zakaždým, neúmerne by sa natiahol čas pre príkazy INSERT, UPDATE a DELETE). Tieto údaje získame takto:
```
SELECT relpages, reltuples FROM pg_class WHERE relname = 'ab';
```
Zadajte do psql na serveri cvika.dcs.fmph.uniba.sk
```
        EXPLAIN SELECT name, deptno, COUNT(empno) OVER (PARTITION BY deptno) FROM emp;
        EXPLAIN SELECT emp.name, dept.name, COUNT(empno) OVER (PARTITION BY emp.deptno) FROM emp JOIN dept ON emp.deptno = dept.deptno;
        
```
a skúste pochopiť, aké činnosti ide vykonať databáza. Tieto činnosti sú popísané pomocou fyzických operátorov, ktoré sú "podrobnejšie" ako operátory relačnej algebry (napr. join je možné spraviť naivným spôsobom "každý s každým" alebo ho urýchliť cez triedenie či hašovanie --- toto plán výpočtu dotazu v relačnej algebre nerozlišuje). Porovnávať plány výpočtu v relačnej algebre je možné, až keď sú jej operátory namapované na fyzické operátory.

Download the file explain.sql to the server cvika.dcs.fmph.uniba.sk (eg using wget) and run psql -f explain. sql. Then run psql, execute the following commands one after the other and analyze the plans created by the scheduler. Try to understand why that plan was chosen.


    explain analyze select * from ab;
    explain analyze select * from ab where b < 4 order by b;
    explain analyze select * from ab where b = 4 order by b;
    create index i1 on ab (b);    -- we hope that adding an index will shorten running time
    explain analyze select * from ab where b < 4 order by b;
    explain analyze select * from ab where b = 4 order by b;
    create index i1h on ab using hash(b);   -- the default index type is btree, we want to try hash index too
    explain analyze select * from ab where b < 4 order by b;
    explain analyze select * from ab where b = 4 order by b;

Execute the last command ten times in a row and watch how the real time consumed changes.


    drop index i1;
    drop index i1h;
    explain select * from ab, bc;    -- materialize stores the result in memory so that we can look at it more than once
    explain select * from ab, bc where ab.b = bc.b;
    
    insert into bc select x.id, x.id + 1 from generate_series(1, 1000000) as x(id);
    explain select * from ab, bc where ab.b = bc.b;
    
    explain select b, count(distinct c) from bc where not exists (select 1 from ab, cd where ab.b = bc.b and cd.c < bc.c) group by b having count(distinct c) < 3;

Beware of using analyze: it can take a very long time.

Materials for database indexes:

Run psql -f explain.sql. Let's compare the plans for the queries A, B, C, D below in several different situations (they differ in the existence of indexes and the amount of data in the tables); we recommend saving the plans for individual repeated queries in separate files for easy comparison.


    (A) explain select * from ab, bc where ab.b = bc.b;
    (B) explain select * from ab, bc where ab.b = bc.b order by ab.b;
    (C) explain select * from ab, bc where ab.b < bc.b;
    (D) explain select * from ab, bc where ab.b < bc.b order by ab.b;

Now we will gradually create indexes and populate tables.


    create index i1 on ab (b);
    /* run all of A, B, C, D */
    
    create index i2 on bc (b);
    /* run all of A, B, C, D */
    
    insert into bc select x.id, x.id + 1 from generate_series(1, 1000000) as x(id);
    insert into bc select x.id, x.id + 1 from generate_series(1, 1000000) as x(id);
    insert into bc select x.id, x.id + 1 from generate_series(1, 1000000) as x(id);
    /* run all of A, B, C, D */
    
    drop index i2;
    create index i2composite on bc (b, c);
    /* run all of A, B, C, D */
    
    create index i1h on ab using hash(b);
    create index i2h on bc using hash(b);
    /* run all of A, B, C, D */

    drop index i1;
    drop index i2composite;
    /* run all of A, B, C, D */

Notice how the "time" calculated by the scheduler grows if the index contains unnecessary columns.

Run psql -f explain.sql analyze plans for the following commands.


    explain select * from ab, bc, cd where ab.b = bc.b and bc.c = cd.c;
        explain select * from ab, bc, cd where ab.b = bc.b and bc.c < cd.c;
    explain select * from ab, bc, cd where ab.b < bc.b and bc.c < cd.c;

    create index i3 on cd(c);
    explain select * from ab, bc, cd where ab.b = bc.b and bc.c = cd.c;
    explain select * from ab, bc, cd where ab.b = bc.b and bc.c < cd.c;
    explain select * from ab, bc, cd where ab.b < bc.b and bc.c < cd.c;

    create index i4 on bc (b, c);
    explain select * from ab, bc, cd where ab.b = bc.b and bc.c = cd.c;
    explain select * from ab, bc, cd where ab.b = bc.b and bc.c < cd.c;
    explain select * from ab, bc, cd where ab.b < bc.b and bc.c < cd.c;
    
    create index i3h on cd using hash(c);
    create index i4h on bc using hash(c);

    explain select * from ab, bc, cd where ab.b = bc.b and bc.c = cd.c;
    explain select * from ab, bc, cd where ab.b = bc.b and bc.c < cd.c;
    explain select * from ab, bc, cd where ab.b < bc.b and bc.c < cd.c;

    insert into ab select x.id, x.id + 1 from generate_series(1, 100000) as x(id);
    insert into bc select x.id, x.id + 1 from generate_series(1, 100000) as x(id);
    insert into cd select x.id, x.id + 1 from generate_series(1, 100000) as x(id);

    explain select * from ab, bc, cd where ab.b = bc.b and bc.c = cd.c;
    explain select * from ab, bc, cd where ab.b = bc.b and bc.c = cd.c order by cd.c;

    explain select cd.c, count(*) from ab, bc, cd where ab.b = bc.b and bc.c = cd.c group by cd.c;

choose 2 plans that you consider the most interesting and describe what and why you find interesting about them; also add one non-trivial example of your own how indexes changed the plan and shortened the computation (be sure to also include the query and possibly the database schema/contents).

Lesson 7

DDL, DML

slides
Postgres documentation for DDL / DML:
- CREATE TABLE
- DROP TABLE
- ALTER TABLE
- INSERT
- UPDATE
- DELETE
- CREATE DATABASE
- DROP DATABASE
We want to establish a database for records of grades, students and teachers at a secondary school. We need to store the following:
- Student --- name, surname, gender, class, date of birth
- Teacher --- name, surname, gender
- Subject --- subject name, abbreviation
- Grade --- the grade itself (text), student, which teacher assigned it, what subject it is from, time of assignment, what it was from (e.g. homework), grade weight (for averaging)
- Not all classes have all subjects, so we need to record which class has which subject.
Design the structure of the tables of the above database --- create a marks.sql file that will contain the SQL definitions of the tables (CREATE TABLE). Add a DROP TABLE IF EXISTS statement to the beginning of the file so that you can run the marks.sql file multiple times. (The integrity of the database, i.e. things like foreign keys, will be dealt with in the next exercise.)
Add the data to the marks.sql file --- add a few rows to each table using INSERT (again --- we want to have the commands written in the file so that we can execute them repeatedly; it can be the marks.sql file ).
Try to use diacritics as well --- pay attention to the encoding of the file marks.sql.
We have decided to make entry and viewing of grades available online. Use ALTER TABLE to add columns to the student and teacher tables to record login names (we will not use passwords for simplicity, for security reasons passwords must not be stored in the database in an exposed form). Create an index to search by login name.
- create an index so that the search works case-insensitive
One of the teachers decided to leave the school and we want to delete him from the database. However, the marks must remain, i.e. his grades will be transferred to another teacher.
- Write a query that moves grades from one teacher to another (we know the IDs of both teachers).
- Write a query that deletes the teacher from the database (based on his ID).
(Historical data logging is one of the most vexing practical problems we face in database design. See, albeit briefly, one of the possible solutions.)
Other tasks:
1. Double the weight of all marks whose value starts with "5" --- write a query.
2. List the student's name, the subject, the student's number of grades in that subject, and a comma-separated list of those grades (try using the function array_agg, or array_to_string).
3. Write a list of students and for each of them the names of subjects such that the student does not have a grade, but should have (this subject is in the list of subjects of his class). Complete the list with the average number of marks in the given subject for the given class.
4. For each subject, calculate the total number and the average number of marks per student. Sort the result by total marks and display only the first 10 rows.
5. For each teacher, calculate the average of the natural-number marks that they entered. Attention, the AVG function needs a number on the input --- you need to use CAST(... AS INTEGER). However, if the number is not a number, it will throw an error. Filter out non-numeric marks, e.g. using regular expressions (e.g. construction WHERE mark ~ '^[0-9]*$').
6. We often need to record very specific data about students - they are only recorded for a small number of students. Create a "moredata" JSON field in the student table, in which such data can be recorded. Fill in some students with visual impairments and then write a query that displays the students with visual impairments.

Lesson 6

Window functions

slides
Create a local database (SQLite/PostgreSQL) or connect to cvika.dcs.fmph.uniba.sk (see previous exercises for instructions).
We work with the employee database (emp.sql). Write (e.g. in a new file window_emp.sql) the following SQL queries using window functions:
1. For each employee, find their order number according to their salary (salary column). Employees with the same salary should have the same order number.
2. For each employee, write the difference between his salary and the average salary in his department.
3. For each employee, list his order number according to his salary within the employees from the same city, the difference from the average salary in the given city and the number of employees in this city.
4. The company needs to save 8000 USD per month on salaries. Display the longest possible list of lowest-paid employees whose total salaries are less than 8000 (i.e., starting with the lowest-paid employee and ending with the employee whose salary, together with previous employees, is closest to 8000).
5. Similar to the previous exercise, but we want to end just above 8000 (i.e. the list ends as soon as the sum of salaries exceeds 8000).
6. Find the median salary for each department. (How-to; before the introduction of PERCENTILE_CONT in PostgreSQL 9.4 in 2014, computing the median was highly non-trivial.)

Access rights

documentation: GRANT, REVOKE
Log in to cvika.dcs.fmph.uniba.sk (see previous exercises for instructions) and connect to the "test" database (psql test).

Useful commands in psql:

 \du
 \from tablename
 SELECT grantee, privilege_type FROM information_schema.role_table_grants WHERE table_name='test';

Create a new table with the command

CREATE TABLE test_name (i INTEGER, t TEXT);

and add a few lines to it:

 INSERT INTO test_name VALUES (1, 'a');
 INSERT INTO test_name VALUES (2, 'b');

Allow the right to SELECT from this table to the role "test" and verify with \z whether it is really granted. Run psql -U test test, try to view the contents of the test_name table and try to insert a new row into it.
Modify the test role's SELECT rights so that it can view only the contents of column t in the table test_vaname.
Allow a classmate to insert rows into the test_name table so that he can assign this permission to others. Try: have this permission assigned to the test role. Then remove the REVOKE GRANT OPTION FOR from him so that he can still insert rows himself. Ask him to try if it works.
Try cascading permission revocation (see CASCADE at http://www.postgresql.org /docs/9.1/static/sql-revoke.html).
Revoke all privileges you have granted to classmates and test roles (REVOKE ALL PRIVILEGES FROM).
Take a quick look at the authentication options when logging into a database to get an idea of the current technologies. (No, we will not require this on any exam.)

Lesson on 28th October 2024 is cancelled

Next lesson will be on 4th November 2024

Lesson 5

Recursion in SQL

slides
examples of valid and invalid use
Create your local database (SQLite/PostgreSQL) or connect to cvika.dcs.fmph.uniba.sk (see previous lessons).
Write a recursive query that calculates the value of n! (factorial) for n = 10.
Write a recursive query that calculates the value of the 30th Fibonacci number.
Consider the employee database that we used in the previous exercises. For each manager, find a list of his subordinates and indicate for each subordinate whether he is direct or indirect.
Create a database of routes between cities (roads.sql) and write a query:
1. from where we can get to Rome
2. from where we can get to Rome after traveling at most 1000 km
3. A truck driver travels a maximum of 720 km in one day. Create a table of pairs [city, number_of_days, number_of_places], where number_of_places determines the number of places that can be reached from the city of city in number_of_days days.

Agregation in SQL

Create your local database (SQLite/PostgreSQL) or connect to cvika.dcs.fmph.uniba.sk (see previous lessons).
Download the files:
- world.sql --- SQL commands to create a database of world cities
- queries_world.sql --- list of queries to solve
Import the world database:
```
psql -f world.sql
```
Solve all the excercises.

Lesson 4

Agregation in SQL

slides
Create your local database (SQLite/PostgreSQL) or connect to cvika.dcs.fmph.uniba.sk (see previous lesson).
Download the file:
- queries_agg.sql --- list of queries to solve; you will edit this file
Solve all the excercises.

Agregation in datalog (voluntary)

Dowload the following files:
- emp.pl --- emp database
- subtotal.pl --- file containing implementation of subtotal v SWI-prologu
- query.pl --- auxiliary file containing defition of q(_) used for output formatting
- queries_agg.pl --- list of queries to solve
Solve all the excercises.

Lesson 3

SQL

slides
Download the following files into one directory:
- emp.sql --- SQL commands to create the EMP database
- queries.sql --- list of queries to write, edit this file

SQLite

First, we will work with SQLite, which database is in one file and no configuration is needed.
1. Create a new database using
```
sqlite3 --init emp.sql emp.db
```
2. Verify that the database is OK:
```
sqlitebrowser emp.db &
```
  or by
```
sqlite3 emp.db
```
```
SELECT * FROM emp;
```
3. Execute queries in the file queries.sql:
```
sqlite3 emp.db < queries.sql
```
  (to run queries, you also use Execute SQL tab in the sqlitebrowseri.)
4. work on the queries in queries.sql
After completing several queries try to use PostgreSQL on cvika.dcs.fmph.uniba.sk (see the slides).

Datalog - more voluntary excercises

Download the following files into one directory:
- queries_library.pl --- list of queries you need to create; you will edit this file
- query.pl --- auxiliary file containing the definition of the q(_) command for formatting query results
Follow the instructions in the file queries_library.pl. Start by creating the database library.pl (use emp.pl from the previous exercise as a model). Add rows to the database to allow you to test the correctness of your solutions for individual queries.
You can also have multiple versions of the database --- the database used is specified in line
:- consult('library.pl').
We recommend that you concentrate on solving tasks 1, 3, 6, 7, 8 already during the exercise and tasks 10, 11, 15, 16 during the exercise or at home.

Lesson 2

Datalog

presentation
Download the following files into the same directory:
- emp.pl --- EMP employee database
- queries.pl --- A file containing a list of exercises. Edit this file.
- query.pl --- utility file containing the definition of the q(_) command for formatting query results
According to the instructions from the previous exercise (or "Working with datalog" in the presentation), solve the tasks in queries.pl.

Lesson 1

Prolog

For the first attempts, we will use the online environment for SWI-Prolog.
presentation (if you want, check also this light intro to prolog)
Exercises:
1. Define some facts about family relationships using predicates male/1, female/1, parent/2 (the number indicates arity).
2. Write down the rules for predicates expressing father, sister, grandmother, cousin
3. Create the predicate ancestor/2 for the relation "to be an ancestor".
4. Create the predicate related/2 for the relation "being a blood relative".

Datalog on server cvika

Login to cvika.dcs.fmph.uniba.sk using ssh (linux, windows (PuTTY)).
Using wget download the following files:
- query.pl --- utility file containing the definition of the q(_) command for formatting query results
- emp.pl --- EMP employee database
- queries.pl --- A file containing a list of exercises. Edit this file.
We will use SWI-prolog to interpret datalog queries: run
swipl -s queries.pl
In another window, edit the file queries.pl; we recommend the editors vim (for console environment) or kwrite (graphical mode if you work on a local computer).
We recommend having the database (the emp.pl file with the list of facts) open in the next window to check the query results.
After each modification of the queries.pl file (don't forget to save the changes), run the
make.
command in the running SWI-prolog (even with a dot at the end, we write the commands after a question mark). Check the compiler results for errors.
We start the calculation of the query (in this example for the predicate job) as follows:
q(job(J)).
The name of the variables does not matter, in place of J can be used also _, but arity (number of arguments) must be present. The q() predicate is used for "nice" formatting of the output and elimination of apparent duplicates.
Practical tips for SWI-prolog:
- Strings starting with an uppercase letter are considered variables by the system. Constants start with lowercase letters. If you mess it up, it will give strangely bad results.
- The is operator is used to evaluate arithmetic expressions, i.e. X is 2+3, not X = 2+3 (in the latter case the symbol = will be interpreted as the unification of terms and no arithmetic operation will occur).
- The following operators are used to compare numbers <, =<, >, >=.

Evaluation

For each of the 3 homework assignments, there is a max. 30 points.

Additional points are obtained for solving exersises from lessons: 0 or 1 point for each of the 12 lessons. From the points awarded for lesson exercises, it is necessary to obtain at least 9 to successfully complete the course.

Within 3 days after each lesson, solutions must be sent by e-mail to the address rjasko (at) dcs.fmph.uniba.sk. Subsequently, the solutions will be briefly evaluated: in case of sufficiency, 1 point will be awarded, in case of insufficiency, the student will be asked to supplement the solution (if he submitted at least something meaningful), for which he has another 3 days, and then he will receive a final evaluation.

Not all exercises need to be solved; it is neccessary to have solved (at least somehow correctly) about 40% of the exercises from each topic. If the set of exercises is more extensive, they are usually at least approximately in order of difficulty; if you want to solve a minimum of exercises, choose some more demanding ones from the second half. The recommended procedure is to go in order and skip exercises where it is clear that you already know how to solve them (eg they are similar to already solved exercises and you can see what the difference is and how to deal with it).

A --- 92 and more points

B --- 84 - 91 points

C --- 76 - 83 points

D --- 68 - 75 points

E --- 60 - 67 points

Fx --- less than 60 points