CS7079 - Data Warehousing and Big Data (2022/23)
|Module specification||Module approved to run in 2022/23|
|Module title||Data Warehousing and Big Data|
|Module level||Masters (07)|
|Credit rating for module||20|
|School||School of Computing and Digital Media|
|Total study hours||200|
|Running in 2022/23(Please note that module timeslots are subject to change)||
The module aims to strengthen students’ skills in data technologies ranging from database and data warehousing to Big Data. First, it will provide students with good understanding of database concepts and database management systems in reference to modern enterprise-level database development. Once gaining good skills in database development, students will be able to study and gain an in-depth understanding of data warehousing which include concepts and analytical foundations as well as data warehousing development. Through intensive hands-on sessions, the students will be able to get familiar with related technological trends and development in the field. the module will leverage a portfolio of SQL server tools that include SQL Server DBMS, SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS) and SQL Server Analysis Service (SSAS) to provide hands-on experience in implementing a reporting solution through a combination of assignments and lab exercises.
The module introduces also the foundation of Big data management based on Apache Hadoop platform and provides students with a broad introduction to Big Data technologies. This will involve hands-on sessions, designed for data analysts, business intelligence specialists, developers, administrators or anyone who has a desire to learn how to process and manage massive and complex data to infer knowledge from data. Topics include Hadoop, HDFS, MapReduce, Spark, Sqoop, Hive, Pig and MLlib.
Prior learning requirements
• Introduction of database models and system architecture [LO1,LO6]
• Multidimensional data modelling (Data warehouse, Dimensional model concepts, Dimensional modelling process, Dimension Normalization) [LO5]
• SQL data manipulation and OLAP operations [LO5]
• Data processing using DBMS; data definition and manipulation using SQL [LO1]
• Introduction of Data warehouse concepts and analytical foundations [LO2]
• Data warehouse development; system architecture and data transformation; investigation of techniques for distributing and mining data [LO2]
• An introduction to Big Data technology stack, emerging trends and use cases where Big Data outperforms traditional data warehouse [LO3, LO6]
• An overview of the function components of Big Data technology stack including open source tools like Hadoop, HDFS, MapReduce, Yarn, Spark, Storm, Hive for massively parallel on-disk data processing [LO4]
• An overview of batch and real-time data ingestion patterns using Apache Flume and Kafka, data transformation techniques and generation of summary statistics using Apache Spark [LO3, LO4]
• Data Analytics on Hadoop platform using Apache Spark for data analysis on HDFS [LO4]
Balance of independent study and scheduled teaching activity
The delivery and the teaching of the materials will be through a mixture of lectures, workshops, and laboratory and tutorial sessions and under the following strategy: the first hour of lectures will be delivered to introduce concepts and principles of the module’s topics. The second hour will be run in a form of workshop to further explain approaches through real life examples. Each lecture will be followed either by a laboratory or a seminar. Seminar time might be used to facilitate group meetings to cultivate research oriented skills or to introduce students to state of the art not covered in the specific syllabus. For the self-study exercises and assessment, students are expected to spend time on unsupervised work in the computer laboratories, searching primary sources of information in the library and in private study. It is also expected that students will dedicate hours for coursework and case study implementation and for summative assessment (final exam). The teaching and learning methods will encourage open and self-directed learning, deepen students’ understanding and stress analytical skills.
Blended learning: use the university’s VLE and online tools to provide and deliver content, assessment and feedback, to encourage active learning and to enhance students’ engagement and learning experience.
On successful completion of this module the student should be able to:
LO1 Understand, appraise and demonstrate familiarity with the operation of DBMS systems and appreciate the complexity of developing real-life applications.
LO2 Demonstrate competence in the process of developing, configuring, utilising and managing of data warehouse applications in a variety of contexts.
LO3 Comprehensive understanding of the principles of organisation, validation, transformation and analysing large volumes of data on specialized platforms (Big Data) from various data sources – files, databases, server logs, etc.
LO4 Demonstrate comprehensive understanding of the advantage and limitations of Big Data technologies, including predictive analytics and build the confidence to interpret data as insights to drive organisational success.
LO5 Demonstrate competence in advanced SQL and OLAP operations.
LO6 Understand, appraise and participate in the legal, social, ethical and professional framework for developing data-intensive systems.
The assessment will consist of one coursework and an unseen examination. The examination will test the students’ retention, an understanding and insight drawn from the entire course (LO1, LO2, LO3, LO4, LO5, LO6). The Coursework will have one piece of assignment that assesses part of the practical aspects of the module; Students will be given a case study that will be a scaled down version of a real life Big data application. Also in the coursework students will be required to demonstrate their awareness of recent research developments and current Big Data technology trends and writing an essay in which they will contrast new approaches to conventional ones (LO3, LO4). Some aspects of the coursework will also prepare students for their curricular projects.
The module will be passed on the aggregate mark of all assessment items.
1. Ponniah, P. (2001), Data warehousing fundamentals, Joh Wiley & Sons; ISBN: 0-471-41254-6
2. White, T. (2015), Hadoop: The definite Guide. Sebastopol: O’Reilly & Associates. ISBN-10: 1491901632
3. Inmon, W.H (2005), ‘Building the data warehouse’ 4th Edition’ Joh Wiley,
4. Rainardi, V. (2007), Building a data warehouse with examples in SQL server, Apress; ISBN-13:978-1-59059-931-0
5. Grover, M., Malaska, T., Seidman, J., Shapira, G. (2015), Hadoop Application Architectures: Designing Real-World Big Data Applications, O'Reilly: ISBN 1491900083
1. International Journal of Data Warehousing and Mining; ISSN: 1548-3924
2. International Journal of Data Warehousing and Mining, DOI:10.4018/IJDWM.2018010102