课程概要-信息存储系统教育部重点实验室

文件系统和分布式数据管理系统

Building File Systems and Distributed Data Management Systems for Performance and Reliability

课程简介

File systems are a cornerstone on any computers to support lower-level data management, and distributed file systems play a critical role in providing file services to all kinds of applications. In recent years, management of data in a flat name space, key-value (KV) store, is widely used for its simplicity and efficiency. These systems are often critical layers of a software stack in a large-scale data center supporting Internet-wide services driven by data, in particular big data. Provisioning the services, such as search, advertising, email, maps, video, chat, blogger, entails collection, storage, and access of data as well as computation on the data. A unique challenge posed on the software infrastructure is the reality that it runs on a very large number of mostly off–the–shelf hardware parts, including processors, network adapters/routers, and disks. This has substantially changed the landscape of the research and practice of large-scale distributed computing, which now has to assume that failure is the norm, rather than an exception. Accordingly, fault tolerance must take the first priority in the design. Further, because of huge and ever-increasing data set and system scale, many other issues must also be re-examined to meet the system’s requirements on reliability, scalability, availability, and efficiency. Understanding the design challenges, issues, scope, and the state-of-the-arts is essential not only for system researchers and practitioners, but also for application developers who access and process (big) data on the cloud.

This course has three sections. It will first cover basic concepts and design techniques for file systems, including data structures and algorithms used in fast file systems, log-structured file system as well as journaling and copy-on-write techniques. This is followed by discussions on distributed file systems, in which issues such as replication, consistency, synchronization, and fault tolerance will be covered. In its case studies, some well-known systems, such as Google’s GFS and Ceph, will be discussed. In the last section, the instructor will focus on key-value stores to show how the issue of read and write amplifications is addressed. Example stores to be studied include Google’s LevelDB and SILT. He will conclude this section by introducing the LSM-trie KV store, one of his recent works with open-source code available for learning and adopting.

This course will focus on understanding of issues, design choices, and problem solving skills, rather than simply sharing concepts and facts about the systems. Students will immerse themselves in the knowledge and skills that are highly relevant to today’s IT practices about big data.

The course has three key objectives for the students:

1. To become aware of specific challenges and issues facing today’s big data processing systems from both system and workload perspectives.

2. To understand why and how some well-known systems address issues they were designed to attack as well as their relative weaknesses.

3. To have hands-on experience on manipulating a data management system.

联系方式

邮箱：
dragonstar2015@126.com

电话：027-87792450

地址：湖北省武汉市洪山区华中科技大学，武汉光电国家实验室，信息存储及应用实验室

邮编：430074

友情链接

信息存储与应用实验室
 华中科技大学
 龙星计划