江松 The ECE department, Wayne State University
Title: Co-locating Metadata and Data to Improve Efficiency of Virtual Disks
Virtual block devices are widely used to provide block interface to virtual machines (VM). A virtual block device manages an indirection mapping from the virtual address space presented to a VM, to a storage image hosted on file system or storage volume. This indirection is recorded as metadata on the image, which needs to be immediately updated upon each space allocation for data safety. Though each update involves only a few bytes of metadata, it demands a random write of an entire block. Furthermore, data consistency demands correct order of metadata and data writes be enforced, usually by inserting expensive FLUSH commands between them. The metadata operations compromise virtual devices’ efficiency.
This talk will introduce Selfie, a virtual disk format that eliminates frequent metadata writes by embedding metadata into data blocks. Selfie makes write of a data block and its associated metadata be completed in one atomic block operation. This is made possible by opportunistically compressing data in a block to make room for the metadata. Experiments show that Selfie gains as much as 5x performance improvements over existing mainstream virtual disks. It delivers near-raw performance with an impressive scalability for concurrent I/O workloads.
Bio of the speaker:
Dr. Song Jiang is an associate professor of the ECE department at Wayne State University. He received his B.S and M.S from the University of Science and Technology of China, and his Ph.D in computer science at the College of William and Mary in 2004. He is a recipient of 2009 US National Science Foundation (NSF) CAREER award. He has been on many conference program committees and proposal review panels. He has been involved in projects at Facebook and Baidu as a collaborator for providing high-quality Internet-wide services based on big data, resulting in significant publications at top-tier conferences.
Dr. Jiang’s research has generated substantial impact on the industry. Several of his proposed algorithms on memory and storage management have been officially adopted in mainstream systems including Linux kernel, NetBSD kernel, and storage engine of MySQL.
Title: Distributed Data Storage and Query Processing in Large RFID-enabled Supply Chains
Radio Frequency Identification (RFID) has dramatically streamlined supply chain management by automatically monitoring and tracking commodities. Considering the proliferation of RFID data volume, distributed storage is more applicable and scalable than centralized storage for distributed query processing. Traditional distributed RFID data storage requires each distribution center to locally store raw RFID data, leading to data redundancy, storage and query inefficiency. In this talk, we will present an efficient distributed storage model
by leveraging Bloom filters to save storage space and improve query efficiency. Meanwhile, we establish corresponding query processing schemes to locally support existence queries and path
queries, which are two kinds of most popular queries in the supply chain management. A local query can be completed with constant time complexity regardless of data volume. Experiments
demonstrate that our storage model outperforms the traditional one in terms of both space and time efficiency.
Dr. Bin Xiao received the B.Sc and M.Sc degrees in Electronics Engineering from Fudan University, China, and Ph.D. degree in computer science from University of Texas at Dallas, USA. After his Ph.D. graduation, he joined the Department of Computing of the Hong Kong Polytechnic University as an Assistant Professor. Now he is an associate professor. His research is mainly on mobile cloud computing, smart phone technology, network security, and wireless networks. He is the editor of 3 books and has published more than 100 technical papers in international journals and conferences. His publications can be found at IEEE ToN, TC, TMC, TPDS, TSP, ACM TOSN, and INFOCOM. Currently, he is the associate editor of the International Journal of Parallel, Emergent and Distributed Systems and an editor of the Journal of Computer Applications. He is the IEEE Senior member and the recipient of the best paper award of the international conference IEEE/IFIP EUC-2011.
Title: Datacenter Network Innovation Driven by Application-Demand
The network research projects should focus on meeting the requirements of datacenter applications. In this talk, two novel datacenter network papers would be introduced. The first one is published in infocom 2015, which provides integrated routing and priority coflow scheduling. The second paper is published in eurosys 2015, which provides deadline guarantee for inter-datacenter flows.
Chen Tian is an associate professor in the School of Electronics Information and Communications, Huazhong University of Science and Technology, China. He received the BS, MS and PhD degrees from the Department of Electronics and Information Engineering, Huazhong University of Science and Technology, China, in 2000, 2003, and 2008 respectively. From 2012 to 2013, he was a postdoctoral researcher with the Department of Computer Science, Yale University. His research interests include network function virtualization, data center networks, distributed systems, Internet streaming and big data processing for smart city.
Title: HAS: Heterogeneity-Aware Selective Layout Scheme for Parallel File Systems on Hybrid Servers
Hybrid parallel file systems (PFS) provide a promising design for data intensive applications. The efficiency of a hybrid PFS relies on the file's data layout. However, most current layout strategies are designed and optimized for homogeneous servers. Using them directly in a hybrid PFS neither addresses the heterogeneity of servers nor the varying access patterns of applications, making hybrid PFSs disappointingly inefficient.
In this talk, we propose HAS, a novel heterogeneity-aware selective data layout scheme for hybrid PFSs. HAS alleviates the inter-server load imbalance through skewing data distribution on heterogeneous servers based on their storage performance. To largely improve the entire system's I/O efficiency, HAS adaptively selects the optimal data layout from three typical candidates according to the application's data access patterns, based on a newly developed selection and distribution algorithm. Experimental results show that HAS significantly increases the I/O throughput of hybrid PFSs.
Shuibing He, received his PhD degree in computer science and technology from Huazhong University of Science and Technology, China, in 2009. He joined Computer School of Wuhan University in 2010. From 2012 to 2015, he was a postdoctoral researcher at the Department of Computer Science, Illinois Institute of Technology. His current research areas include parallel I/O systems, file and storage systems, high-performance computing, and distributed computing. Dr. He has published around 30 international conference and journal papers, including ICDCS, IPDPS, ICPP, CLUSTER, ICA3PP, etc.
毛波，博士，厦门大学软件学院助理教授。2005年7月本科毕业于东北大学计算机科学与技术专业，2010年7月博士毕业于华中科技大学计算机系统结构专业，2010年10月至2013年1月在美国内布拉斯加大学林肯分校从事博士后研究，2013年4月加入厦门大学软件学院。主要研究方向为云存储、重复数据删除、固态盘阵列、及新型存储器技术等，在中国计算机学会推荐的B类及以上会议和期刊发表论文10余篇，包括IEEE TC、ACM TOS、FAST、IPDPS和LISA等，并担任多个国际会议和期刊的审稿人。
Title: Edelta: A Word-Enlarging Based Fast Delta Compression Approach
Delta compression, a promising data reduction approach capable of finding the small differences (i.e., delta) among very similar files and chunks, is widely used for optimizing replicate synchronization, backup/archival storage, cache compression, etc. However, delta compression is costly because of its time-consuming word- matching operations for delta calculation. Our in- depth examination suggests that there exists strong word-content locality for delta compression, which means that contiguous duplicate words appear in approximately the same order in their similar versions. This observation motivates us to propose Edelta, a fast delta compression approach based on a word-enlarging process that exploits word-content locality. Our evaluation based on two case studies shows that Edelta achieves an encoding speedup of 3-10X over the state-of-the-art Ddelta, Xdelta, and Zdelta approaches without noticeably sacrificing the compression ratio.
Wen Xia received his Ph.D. degree in Computer Science from Huazhong University of Science and Technology in June, 2014. His current research interests include data deduplication, data compression, storage systems, and cloud storage. He publishes several papers in major journals and conferences including IEEE TC, USENIX ATC, FAST, INFOCOM, Performance, MSST, IEEE DCC, HotStorage, etc. He is a member of IEEE.