Data Management on Modern Computer Architectures

An advanced, in-depth course on data-oriented design and shared-memory parallelism in which you will learn how to optimise software to run closer to “bare metal speed.” Focusing on a semester-long group project, you will learn how to leverage knowledge of computer architecture to tackle the memory wall, optimise data structures for locality of reference, exploit “single-core parallelism” (i.e., superscalar cores and SIMD), expose data-level parallelism for multi-core machines, and transform graphics processing units (GPU’s) into general purpose compute using CUDA. This knowledge will help you write software that increases performance per watt (saving mobile battery life or data centre emissions) and that scales into the future.

(Outdated) materials for this course are available online. Offered as a graduate-level course (CSC 586C) and as an undergraduate-level topics course (CSC 485C).

[This course has now been deprecated.]

Algorithms and Data Models

A data model is “a notation for describing data or information” that typically consists of a description of the structure of data, operations that are permitted on that data, and constraints that exist on the data (Garcia-Molina et al., 2009). In this advanced course, we will develop theoretical foundations for NoSQL and NewSQL data models so that we can better understand modern systems of data management. Critically, we will also develop and implement small practical applications to apply that theory within a Data Science context.

(Very outdated) materials for this course are available online. Offered as a graduate-level course (CSC 501) as part of the Master’s of Applied Data Science (MADS) program.

Database Systems

It is cliche by now to claim that “data is the new oil” as it powers the Information Age and could be considered to be the most valuable asset for many high-growth, high tech start-ups. But the value of data can only be realised if it is managed well. This course gives an overview of relational database management systems (RDBMS’s), a foundational tool in data management, including how to model the world with relational schemata, extract and analyse data with declarative query languages, and design data-powered applications.

Assignment materials for this course are available online.

Directed Studies

I regularly run directed studies courses with students in which we focus on a more individualised topic than is available in the standard curriculum. These are run at both the graduate- and undergraduate-level and often provide an opportunity to engage in undergraduate research. Examples of courses that I have co-designed with students include:

  • Lock-free, GPU, & vector programming (Fall 2023)
  • Data-level Parallelism (Summer 2023)
  • Applied Natural Language Processing (Summer 2023)
  • Conversational AI (Spring 2023)
  • Lock-free Programming (Fall 2022)
  • Machine Learning for Databases (Summer 2022)
  • Data Lakes (Fall 2021)
  • Spatial Data Processing (Spring 2021)
  • Research Methods in Large-Scale Social Network Analysis (Fall 2020)
  • Adaptive Optics (Summer 2020)
  • Data Privacy (Spring 2020)
  • Advanced Topics in Blockchain Research (Spring 2020)