module specification

CC5062 - Data Engineering (2022/23)

Module specification Module approved to run in 2022/23
Module title Data Engineering
Module level Intermediate (05)
Credit rating for module 15
School School of Computing and Digital Media
Total study hours 150
 
36 hours Assessment Preparation / Delivery
69 hours Guided independent study
45 hours Scheduled learning & teaching activities
Assessment components
Type Weighting Qualifying mark Description
Coursework 100%   Group Coursework - apply data engineering techniques in a real-world business problem (4000 words development report+ ar
Running in 2022/23

(Please note that module timeslots are subject to change)
Period Campus Day Time Module Leader
Spring semester North Tuesday Morning

Module summary

This module provides an understanding of data engineering concepts, techniques and tools. It covers the basics of data modelling, storage, retrieval, and processing for data analysis needs. The module aims to provide a set of building blocks through which a complete architecture for modelling, storing and processing data can be constructed. It aims to enable students to apply the practical skills of data engineering techniques in the real world.

The aims of this module are to:
• provide students with an understanding of data engineering concepts and techniques
• enable students to appreciate various modern data engineering tools
• enable students to acquire fundamental knowledge and skills of data modelling, storage, retrieval, and processing for data analysis
• develop students with practical skills in applying tools and techniques to solve real world problems

Syllabus

• Concepts and fundamentals of data engineering

• Data engineering key skills and tools: 
- Linux
- SQL vs NoSQL, Graph database
- Data Warehouse vs data lake
- Star Scheme
- Python Data Frame
- Stream processing with Kafka
- Apache Spark
- Big data and Hadoop platform
- Cloud
- DAMA
•  Workflow of data engineering with ETL processing
- Collecting raw data
- Transforming data for data analysis needs
- Loading data and scheduling tasks
• Work through case studies

Learning Outcomes LO1 - LO5

Balance of independent study and scheduled teaching activity

Topics will be introduced through the medium of formal lectures, supported by tutorial and workshop sessions, and blended learning as follows:
- Lecture (1 hour / week):
Introduction of the major topics identified in the syllabus, plus practical exercises, directed reading and other further studies
- Workshop (2 hour / week):
Data Engineering technical skills will be further developed through lab-based workshops. Specific practical exercises are set to support students' development of skills with relevant packages. 
.
- Blended learning:
Using the University’s VLE and online tools to provide deliver content, assessment and feedback, to encourage active learning, and to enhance student engagement and learning experience.

Students will be expected and encouraged to produce reflective commentaries on the learning activities and tasks that they carry out to complete their work.

Learning outcomes

On successful completion of this module the student should be able to:
[LO1] demonstrate a clear understanding of the key concepts and frameworks of data engineering
[LO2] gain a practical knowledge of various modern data engineering tools and techniques
[LO3] gain  knowledge and skills of data modelling, storage, retrieval, and processing for data analysis
[LO4] develop an awareness of the latest developments in data engineering
[LO5] apply data engineering techniques and tools to solve real-world problems as part of a team

Assessment strategy

The module will be assessed by a practical piece of coursework (100%)

The coursework is designed to assess knowledge and practical skills of the module. It will provide students with the opportunity to undertake research on current issues and practical techniques in data engineering and its effective application. It will also enable students to work as a member of a development team to apply their knowledge to a practical business problem, demonstrating their skills for problem-solving and critical thinking and evaluation. LO1, LO2, LO3, LO4,LO5

Bibliography

Reading list available at: https://rl.talis.com/3/londonmet/lists/26C76F68-8052-DD45-3680-17BD2FC893D8.html?lang=en-US


1. Ralph Kimball and Margy Ross. (2013). The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. 3rd Edition. John Wiley & Sons. (Core)
2. Bas P. Harenslak, Julian Rutger de Ruiter, (2020). Data Pipelines with Apache Airflow. Manning Publications (Core)
3. Bill Bejeck, (2018). Kafka Streams in Action. Manning Publications (Core)
4. Wes McKinney, (2017). Python for Data Analysis, O'Reilly Media
5. Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills, (2017). Advanced Analytics with Spark. O'Reilly Media


Online resources
1. The Data Engineering Cookbook - https://aimlcommunity.com/wp-content/uploads/2019/09/Data-Engineering.pdf
2. Data Engineering - KDnuggets- https://www.kdnuggets.com/tag/data-engineering
3. A Beginner’s Guide to Data Engineering - https://medium.com/@rchang/a-beginners-guide-to-data-engineering-part-i-4227c5c457d7
4. Become a Data Engineer with this Complete List of Resources - https://www.analyticsvidhya.com/blog/2018/11/data-engineer-comprehensive-list-resources-get-started/