CS7079 - Data Warehousing and Big Data (2024/25)
Module specification | Module approved to run in 2024/25 | ||||||||||||
Module title | Data Warehousing and Big Data | ||||||||||||
Module level | Masters (07) | ||||||||||||
Credit rating for module | 20 | ||||||||||||
School | School of Computing and Digital Media | ||||||||||||
Total study hours | 200 | ||||||||||||
|
|||||||||||||
Assessment components |
|
||||||||||||
Running in 2024/25(Please note that module timeslots are subject to change) |
|
Module summary
The module aims to strengthen your skills in data technologies ranging from database and data warehousing to Big Data. First, it will provide you with good understanding of database concepts and database management systems in reference to modern enterprise-level database development. Once gaining good skills in database development, you will be able to study and gain an in-depth understanding of data warehousing which include concepts and analytical foundations as well as data warehousing development. Through intensive hands-on sessions, you will be able to get familiar with related technological trends and development in the field. The module will leverage a portfolio of SQL server tools such as, SQL Server Management Studio (SSMS) and Azure Data Studio, to provide hands-on experience in implementing a reporting solution through a combination of assignments and lab exercises.
The module introduces also the foundation of Big data management based on Apache Hadoop platform and provides you with a broad introduction to Big Data technologies. This will involve hands-on sessions, designed for data analysts, business intelligence specialists, developers, administrators, or anyone who has a desire to learn how to process and manage massive and complex data to infer knowledge from data. Topics include Hadoop, HDFS, MapReduce using tools such as Hive, Pig and Zeppelin for hands-on experience.
Prior learning requirements
N/A
Syllabus
• Introduction of database models and system architecture [LO1, LO5]
• Multidimensional data modelling (Data warehouse, Dimensional model concepts, Dimensional modelling process, Dimension Normalization) [LO4]
• Data processing using DBMS; data definition and manipulation using SQL [LO1]
• Introduction of Data warehouse concepts and analytical foundations [LO1]
• Data warehouse development; system architecture and data transformation; investigation of techniques for distributing and mining data [LO1]
• An introduction to Big Data technology stack, emerging trends and use cases where Big Data outperforms traditional data warehouse [LO2, LO5]
• An overview of the function components of Big Data technology stack including open-source tools like Hadoop, HDFS, MapReduce, Yarn, Spark, Storm, Hive for massively parallel on-disk data processing [LO3]
• An overview of batch and real-time data ingestion patterns using Apache Flume and Kafka, data transformation techniques and generation of summary statistics using Apache Spark [LO2, LO3]
• Data Analytics on Hadoop platform using Apache Spark for data analysis on HDFS [LO3]
Balance of independent study and scheduled teaching activity
The delivery and the teaching of the materials will be through a mixture of lectures, workshops, and laboratory and tutorial sessions and under the following strategy: The first hour of lectures will be delivered to introduce concepts and principles of the module’s topics. The second hour will be run in a form of workshop to further explain approaches through real life examples. Each lecture will be followed either by a laboratory or a seminar. Seminar time might be used to facilitate group meetings to cultivate research-oriented skills or to introduce you to state of the art not covered in the specific syllabus. For the self-study exercises and assessment, you are expected to spend time on unsupervised work in the computer laboratories, searching primary sources of information in the library and in private study. It is also expected that you will dedicate hours for coursework and case study implementation and for group assessment. The teaching and learning methods will encourage open and self-directed learning, deepen you’ understanding and stress analytical skills.
Blended learning: use the university’s VLE and online tools to provide and deliver content, assessment, and feedback, to encourage active learning and to enhance you’ engagement and learning experience.
Learning outcomes
On successful completion of this module, you should be able to:
LO1 Demonstrate competence in the process of developing, configuring, utilising, and managing of data warehouse applications in a variety of contexts using DBMS tools.
LO2 Comprehensive understanding of the principles of organisation, validation, transformation and analysing large volumes of data on specialized platforms (Big Data) from various data sources – files, databases, server logs, etc.
LO3 Demonstrate comprehensive understanding of the advantage and limitations of Big Data technologies, including predictive analytics and build the confidence to interpret data as insights to drive organisational success.
LO4 Demonstrate competence in SQL.
LO5 Understand, appraise, and participate in the legal, social, ethical and professional framework for developing data-intensive systems working in an agile team environment.
Bibliography
https://rl.talis.com/3/londonmet/lists/EEAB6EE8-5F11-574C-5F41-C1256066947F.html?embed=1&lang=en-GB&login=1
Core Textbook:
1. Ralph Kimball, Margy Ross (2013), The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition, Wiley; ISBN-10: 1118530802
2. White, T. (2015), Hadoop: The definite Guide. Sebastopol: O’Reilly & Associates. ISBN-10: 1491901632
Other Texts:
3. Inmon, W.H (2005), ‘Building the data warehouse’ 4th Edition’ Joh Wiley,
ISBN: 978-0-764-59944-6.
4. Rainardi, V. (2007), Building a data warehouse with examples in SQL server, Apress; ISBN-13:978-1-59059-931-0
5. Grover, M., Malaska, T., Seidman, J., Shapira, G. (2015), Hadoop Application Architectures: Designing Real-World Big Data Applications, O'Reilly: ISBN 1491900083
Journals:
1. International Journal of Data Warehousing and Mining; ISSN: 1548-3924
2. International Journal of Data Warehousing and Mining, DOI:10.4018/IJDWM.2018010102
Websites:
1. Kimball Techniques - Kimball Group
2. https://www.sas.com/en_gb/insights/big-data/what-is-big-data.html
Electronic Databases:
https://www.sas.com/en_us/insights/data-management/data-warehouse.html
Social Media Sources:
Data Warehouse for Beginners | What is Data Warehouse (analyticsvidhya.com)
Other