特邀报告

(排名不分先后,具体时间安排参会时详见程序手册)

大会特邀报告1:高性能计算机存储系统研究与探索

报告摘要:

高性能计算机系统为解决“大”问题而诞生,它对数据集的访问具有相关性,要求存储系统提供全局视图和共享一致性语义。随着系统和计算课题规模的增加,高性能计算机对存储系统的扩展性和鲁棒性也提出了苛刻要求。然而在存储系统设计中,共享存储一致性语义增加了系统的耦合度,它与系统的可扩展性和鲁棒性之间存在矛盾。历代神威高性能存储系统都致力于解决这一难题。从计算与存储共享到存储系统独立构建,从单一系统到多级架构,从全局系统到局部缓冲系统,神威高性能存储系统经历了从体系结构解耦向存储软件功能部件解耦的发展。本报告对这一过程进行介绍,阐述神威高性能存储系统为更好地适应高性能计算机应用需求而做的努力与探索。外在高性能计算机的发展历程中,大计算伴随着大数据这一特点日益显著。本报告也就这一话题对高性能计算机系统如何支持好数据密集型应用方面介绍了一些想法和思路。

报告人简介:

陈左宁,中国工程院院士,计算机专家,国家并行计算机工程技术研究中心总工程师, 现任中国工程院副院长、党组成员,国家制造强国建设领导小组成员,中国科学技术协会第九届副主席。陈院士长期献身于我国计算机事业,先后参加或领导多台国产高性能计算机系统研制工作,主持研制了多套国产大型系统软件项目。在操作系统的研制方面,她参与主持设计了第一个与国际工业标准兼容的国产并行操作系统,对并行操作系统的理论与实现提出了独到见解;创造性地提出并实现了“多虚空间多重映射”的层次式操作系统结构,成功地解决了分布式系统中效率与可伸缩性这一主要矛盾;提出并实现了系统级容错的思想,成功解决了巨型机的可用性问题。她曾两次获得国家科技进步特等奖、两次获得国家科技进步一等奖。

大会特邀报告2:Challenges of Coping with Increasing Volumes of Data in The Era of Big Data

报告摘要:

We are living a rapidly changing digital world where we are inundated by an ocean of data that is being generated at a rate of 2.5 billion GB every single day! It is projected that by 2017 humans will have generated 16 trillion GB digital data. This phenomenal growth and ubiquity of data has ushered in an era of “Big Data”, which brings with it new challenges as well as opportunities. In this talk, I will first discuss big data challenges facing and opportunities for computer and storage systems research, with an emphasis on challenges and research questions brought on by the almost unfathomable volumes of data. I will then present some recent solutions proposed by my research group that seek to address the performance and space issues facing big data and data-intensive applications by means of data reduction.

报告人简介:

Hong Jiang received the B.Sc. degree in Computer Engineering in 1982 from Huazhong University of Science and Technology, Wuhan, China; the M.A.Sc. degree in Computer Engineering in 1987 from the University of Toronto, Toronto, Canada; and the PhD degree in Computer Science in 1991 from the Texas A&M University, College Station, Texas, USA. He is currently Chair and Wendell H. Nedderman Endowed Professor of Computer Science and Engineering Department at the University of Texas at Arlington. Prior to joining UTA, he served as a Program Director at National Science Foundation (2013.1-2015.8) and he was at University of Nebraska-Lincoln since 1991, where he was Willa Cather Professor of Computer Science and Engineering. His present research interests include computer architecture, computer storage systems and parallel I/O, high-performance computing, big data computing, cloud computing, performance evaluation. He has graduated 16 Ph.D. students who now work in either major IT companies or academia. He has over 200 publications in major journals and international Conferences in these areas, and his research has been supported by NSF, DOD, the State of Texas and the State of Nebraska. Dr. Jiang is a Fellow of IEEE, and Member of ACM.

大会特邀报告3:智能云计算基础架构

报告摘要:

很多最智能的大数据应用正运行在最“傻”的云计算基础架构上。虽然当前的云计算给数据中心运维带来了一定的自动化,然而这种规则化的运维已经无法满足人们对于可靠性、灵活性和效率的要求。我们将通过几个最近的研究项目,在数据中心智能供电、软件定义光交换网络、数据中心知识挖掘等方面,介绍我们团队在智能数据中心方面的工作和对未来的展望。

报告人简介:

徐葳,清华大学交叉信息研究院助理教授,助理院长,数据科学研究院管理委员会委员,教育部在线教育研究中心国际合作总监。研究方向是分布式系统和机器学习。美国加州大学伯克利分校计算机硕士、博士,宾夕法尼亚大学计算机学士(在清华计算机系本科学习两年)。2013年入选“青年千人计划”,曾获得谷歌、微软、IBM的教授科研奖。在系统、网络、机器学习、光通讯等多领域顶尖会议如SOSP,Sigcomm,EuroSys,ICML,OFC 等发表论文20余篇,总引用超1300次。他曾在谷歌总部工作,负责基础架构研发。

大会特邀报告4:Opportunities and Challenges Faced by DRAM/Flash/NVRAM Technologies

报告摘要:

Big data applications demand high speed, reliable, and energy efficient data storage systems. Traditional storage architectures have fundamental limitations because of legacy systems that have centered on spinning hard disk drives. With rapid advances in DRAM and nonvolatile memories such as NAND-gate flash, phase change memory, Memristor, and magnetic RAM, a great opportunity exists for optimizing storage architectures. This talk will summarize recent developments in semiconductor memory technologies and their potential roles in storage systems. From system architecture perspective, we analyze respective significance of each memory technology. To build optimal storage systems, storage hierarchy is the way to go. We introduce a new concept: "Content Locality" for storage hierarchy design. Based on this new concept, various memory/storage architectures will be presented and compared in terms of how they exploit the physical properties of various memory technologies as well as memory access behaviors at customer sites. Quantitative evaluations of memory performances and market trend will be discussed and explored as well.

报告人简介:

Qing Yang is Distinguished Engineering Professor in the Department of Electrical, Computer, and Biomedical Engineering at University of Rhode Island. He is a director of High Performance Computing Lab (HPCL) of URI and is a recipient of many accomplishment awards at URI. His research interests include computer architecture, memory and storage systems, computer networks, embedded computer systems and applications in neural-machine interface and biomedical engineering. He has published over 100 technical articles in these areas and held over a dozen issued patents and over a dozen pending applications. Majority of his patents have been licensed to computer industry with significant practical impacts. Four high tech startup companies have been formed based on his patents. His latest startup, VeloBit, was based on I-CASH architecture and was successfully acquired by Western Digital in July 2013. Yang is a Fellow of IEEE. He has served in the professional society in various capacities including general chair of the ACM/IEEE International Symposium on Computer Architecture (ISCA2011), IEEE international Conference on Network, Architecture, and Storage (NAS), IEEE Workshop on Storage Network Architecture and Parallel I/Os (SNAPI); IEEE Distinguished Speaker; Editor of IEEE Transactions; and Program Committee member of numerous international conferences. Besides being a principal investigator of many academic research projects, Yang has also done collaborative research with IBM, Intel, EMC, Freescale, and several startup companies in the Boston area.

大会特邀报告5:面向大数据时代信息存储与处理融合的忆阻器研究

报告摘要:

大数据时代的挑战不仅仅是信息量的爆炸式增长,还表现在信息的存储和处理方法的巨大革新。传统计算机体系中的冯诺依曼瓶颈--存储和运算分离的模式大大限制了信息存储与处理的能力,因此开发出将信息存储和处理融合的新型存储器件将为新型颠覆性计算机体系奠定器件基础。报告将介绍忆阻器的研究进展及其在模拟式类脑神经形态计算和数字式非易失逻辑运算的功能演示。

报告人简介:

缪向水,教育部“长江学者”特聘教授,华中科技大学武汉国际微电子学院副院长、武汉光电国家实验室信息存储材料及器件研究所所长;中国仪表功能材料学会副理事长、中国微米纳米技术学会常务理事、中国真空学会薄膜专委会委员;Scientific Reports编辑、IEEE Nanotechnology Magazine技术编辑。长期从事相变存储器、忆阻器、磁存储器、光存储等信息存储材料与器件领域的研究工作。

青年学者论坛报告

(排名不分先后,具体时间安排参会时详见程序手册)

青年论坛报告1:I/O Scheduling with Mapping Cache Awareness for Flash Based Storage Systems

报告摘要:

NAND flash memory is the default storage component in mobile systems. One of the key technology for NAND flash storage systems is the address mapping scheme between the logical addresses and physical addresses, which is designed to solve the inability of in-place-update issue of flash memory. Demanding based page mapping scheme is often applied as it can effectively match the mapping cache size and performance requirement for mobile storage systems. However, recent studies showed that demand based page mapping scheme is sensitive to the host I/O patterns, especially when the mapping cache is limited in size, which leads to performance degradation. In this talk, I will introduce a novel I/O scheduling scheme, MAP, to solve the issue. The main idea of the proposed scheduling approach is to reorder I/O requests for performance improvement with two schemes: First, prioritize the requests which will hit in the mapping cache. Second, group requests with related logical addresses into large batches. A series of experiments were conducted to evaluate MAP, and experiment results showed that MAP can improve I/O performance by 29% for read requests and 8% for write requests on average compared with traditional I/O scheduler.

报告人简介:

石亮,重庆大学计算机学院副教授。2013年在中国科学技术大学获得博士学位和香港城市大学联合培养博士学位。研究方向是嵌入式系统、实时系统与存储系统。目前主持一项国家自然科学基金青年基金和多项中央高校基金,作为核心成员,参加多项国家自然科学基金项目,国家重点研究计划863项目以及企业合作项目等。近年来在国际重要学术期刊和会议上发表超过50篇学术论文,包括近18篇IEEE Transactions 和ACM Transactions 论文,以及多篇存储系统与嵌入式系统的重要会议论文,并最近三年发表了ICCD、DAC、MSST以及FAST等多个国际高水平会议论文。石亮博士一直在多个国际会议和期刊进行学术服务,其中包括担任LCTES、NVMSA、ASPDAC、VLSI、RTCSA和DAC等会议的技术评议会委员,担任IEEE TVLSI、IEEE ESL、IEEE TCAD、IEEE TC、ACM TECS、IEEE MSCS、 IEEE TPDS等杂志的审稿人。石亮博士是CCF、IEEE和ACM会员。

青年论坛报告2:SSD Friendly Caching towards Data Center Workloads

报告摘要:

Flash-based SSD bridges the performance gap between DRAM and hard disks. SSD caching is thus widely deployed in data center servers. However, the inherent features of SSD, such as limited write endurance and garbage collection, impose challenges to system designs of SSD caching. In this talk, I will present two SSD friendly caching designs. One is a re-adding based SSD cache policy for data-center workloads. The other is a software-hardware co-designed SSD caching system.

报告人简介:

蒋德钧,博士,中科院计算所副研究员,主要研究兴趣包括新型存储介质,存储系统,虚拟化等。在PACT,ICS,SYSTOR,ICCD等国际会议上发表论文十余篇,研究成果累计引用次数百余次。申请NVM相关专利十多项。承担或参与国家青年科学基金项目,人社部留学人员科技活动优秀项目、973项目子课题、863项目子课题等科研课题。在北京航空航天大学获得学士学位,在清华大学获得硕士学位,在荷兰阿姆斯特丹Vrije Universiteit获得计算机科学博士学位。蒋德钧博士是CCF,IEEE和ACM会员。

青年论坛报告3:垃圾回收感知的闪存存储系统研究

报告摘要:

首先简要介绍ASTL实验室在闪存存储系统、重复数据删除技术和云中云存储系统方面的研究工作,然后重点介绍基于垃圾回收感知的闪存存储系统的研究。垃圾回收会降低闪存存储系统的性能,现阶段的研究大多通过优化垃圾回收过程或减少对闪存存储的写入来减轻垃圾回收对系统性能的影响。另外,已有的缓存管理策略仅考虑通过提高缓存命中率来提高缓存效率,而忽略了垃圾回收过程中的缺失代价和非垃圾回收过程的缺失代价是不同的。针对该问题,我们提出了基于垃圾回收感知的缓存管理策略,该缓存管理策略优先将要发往正在进行垃圾回收的闪存芯片的写数据和热点读数据保留在缓存中,以尽可能降低缓存不命中时的缺失代价,减轻用户请求和垃圾回收操作引起的内部请求之间的资源冲突。原型系统验证结果表明基于垃圾回收感知的缓存管理策略优于现有的闪存缓存管理策略。最后,介绍我们在基于垃圾回收感知的固态盘阵列方面的研究工作。

报告人简介:

毛波,博士,厦门大学软件学院副教授。2005年7月于东北大学计算机科学与技术专业获得学士学位,2010年7月于华中科技大学计算机系统结构专业获得博士学位,2010年10月至2013年1月在美国内布拉斯加大学林肯分校从事博士后研究,2013年4月加入厦门大学软件学院并组建厦门大学先进存储技术实验室。中国计算机学会CCF体系结构专业委员会委员,主要研究方向为云存储、重复数据删除、固态盘阵列和新型存储技术等,在CCF推荐会议和期刊上发表论文20多篇,包括IEEE TC、IEEE TPDS、ACM TOS、FAST、IPDPS、ICS和LISA等,并担任多个国际会议和期刊的审稿人。

青年论坛报告4:新型自旋电子器件:从存储到计算

报告摘要:

由于漏电流在较小的工艺尺寸下大幅度增加,基于传统CMOS技术的存储和逻辑电路面临着功耗的巨大挑战。新型自旋电子器件由于其非挥发性、高密度、高速度和可集成性,被认为是最有希望攻克功耗难关的前沿技术。磁存储器作为新型自旋电子器件的代表,第一代基于磁场的产品已经应用到军工、航空、航天、汽车等多个领域。随着自旋转移力矩理论的提出,第二代基于电流感应磁化翻转的自旋电子器件在功耗、尺寸和可靠性等方面得到显著优化。同时,自旋轨道耦合、电压调控磁各向异性等新兴效应也为自旋电子器件性能的进一步提升和未来实际应用拓宽了道路。本次报告将围绕电流感应自旋电子器件,重点介绍垂直磁各向异性磁隧道结、磁畴移动赛道存储器的相关科研进展。另外,在存储之外,本次报告还将展望自旋电子器件在逻辑计算方面的应用前景,介绍存储内逻辑、全自旋逻辑等概念和相关工作。

报告人简介:

张悦,北京航空航天大学电子信息工程学院副教授。2009年本科毕业于华中科技大学光电信息工程系,2014年博士毕业于法国南巴黎大学,之后在法国国家科学院进行博士后研究,师从诺贝尔奖得主费尔教授。2015年入选北航“卓越百人计划”。主要从事自旋电子学、新型存储器件、超低功耗逻辑电路等相关研究,研究兴趣围绕在垂直磁各向异性磁隧道结、赛道存储器及全自旋逻辑器件等领域。至今参与出版书籍2部,发表国际期刊和会议论文60余篇,其中包括“ESI高引用论文”1篇,多次被DATE、ISCAS等著名会议邀请报告,获得3次国际会议最佳论文奖。2013年获国家优秀留学生奖学金。2016年担任NANOARCH国际会议Publication Chair。担任IEEE trans. Circuits and Systems I: Regular Papers, IEEE trans. Electron Devices, IEEE trans. Nanotechnology等多部国际期刊审稿人。

青年论坛报告5:HiNFS:一种精细化I/O处理的持久性内存文件系统

报告摘要:

主存级非易失性存储器NVM技术近年来得到了快速发展,为内存级存储系统的构建带来了机遇。NVM可直接接入内存总线,被CPU直接访问,而无需经过DRAM缓存。与传统文件系统经由DRAM缓存访问外存设备不同,当前基于NVM设计的持久性内存文件系统(如微软BPFS、Intel PMFS)均采用直写模式,即直接读写NVM。然而,主存级NVM的性能仍落后于DRAM,且NVM的读写呈现不对称特性,写性能远低于读性能。当前采用直写方式的持久性内存文件系统在一些负载下的性能甚至低于传统文件系统。针对该问题,本报告将介绍一种精细化I/O处理的持久性内存文件系统HiNFS。HiNFS对文件系统内部的I/O请求根据不同需求进行细分,通过组合并自适应选择缓存与直写方式,提供了细粒度的I/O请求处理。测试结果显示,HiNFS有效发挥了NVM与DRAM的比较优势,比现有持久内存文件系统性能提升64%~184%。

报告人简介:

陆游游,清华大学计算机系博士后/助理研究员。2015年于清华大学获工学博士学位,其间于2013年赴卡内基梅隆大学进行访学研究。主要研究方向是文件系统及非易失性存储等,在FAST、USENIX ATC、EuroSys等国际顶级会议以及ACM ToS、IEEE TC等国际顶级期刊上发表论文十余篇。其中,以第一作者在FAST 2013和FAST 2014上连续发表论文,并获得IEEE NVMSA 2014大会唯一最佳论文奖和MSST 2015最佳论文提名奖。受邀担任 FAST、MSST等国际会议及ACM ToS、IEEE TC、TPDS、IEEE Computer等国际期刊杂志审稿人。2015年,入选首届中国计算机学会“青年人才发展计划”及中国科协“青年人才托举工程”。

青年论坛报告6:移动智能终端用户体验与存储系统优化技术的探索

报告摘要:

随着移动微处理器和移动操作系统的发展,智能移动终端(例如智能手机、平板)的性能得到了显著提升,使得应用程序的功能愈加丰富。虚拟化技术、AR/VR等各类新型应用也开始出现在移动智能终端中。功能愈加丰富的应用对移动智能终端的用户体验与存储系统提出了新的挑战。智能终端的存储系统、能耗以及性能等各方面都需要面向不同用户及其使用行为进行相应优化。与传统智能移动终端所采用DRAM内存相比,新型非易失性存储器具有低静态功耗、高存储密度等特点,为优化移动智能终端用户体验与存储系统性能提升带来了新机遇。本次报告将介绍当前移动智能终端存储系统面临的主要问题,并探讨如何在移动智能终端中利用新型非易失性存储技术进行存储系统优化以提升用户体验。

报告人简介:

刘铎,重庆大学“百人计划”研究员,博士生导师,重庆高层次引进人才。 现任中国计算机学会青年计算机科技论坛(CCF YOCSEF重庆)主席(16-17年度),信息存储技术专委会委员,普适计算专委会委员,中国计算机学会高级会员。于2012年获得香港理工大学电子计算学系计算机科学博士学位。研究领域包括新型存储体系结构、嵌入式系统与智能终端优化、物联网与大数据应用等。先后获得并参与了国家自然科学基金、教育部博士点基金、国家863计划等多个国家级科研项目,曾与华为技术有限公司开展了智能手机方面的合作研究。其在IEEE/ACM Transactions等国际期刊和会议上发表论文60余篇,谷歌学术统计他引600余次,并担任20多个国际期刊和会议的评审专家。

博士生论坛报告

(排名不分先后,具体时间安排参会时详见程序手册)

博士生论坛报告1:Efficient and Available In-memory KV-Store with Hybrid Erasure Coding and Replication

报告摘要:

In-memory key/value store (KV-store) is a key building block for many systems like databases and large websites. Two key requirements for such systems are efficiency and availability, which demand a KV-store to continuously handle millions of requests per second. A common approach to availability is using replication such as primary-backup (PBR), which, however, requires M+1 times memory to tolerate M failures. This renders scarce memory unable to handle useful user jobs.This paper makes the first case of building highly available in-memory KV-store by integrating erasure coding to achieve memory efficiency, while not notably degrading performance. A main challenge is that an in-memory KV-store has much scattered metadata. A single KV put may cause excessive coding operations and parity updates due to numerous small updates to metadata. Our approach, namely Cocytus, addresses this challenge by using a hybrid scheme that leverages PBR for small-sized and scattered data (e.g., metadata and key), while only applying erasure coding to relatively large data (e.g., value). To mitigate well-known issues like lengthy recovery of erasure coding, Cocytus uses an online recovery scheme by leveraging the replicated metadata information to continuously serving KV requests. We have applied Cocytus to Memcached. Evaluation using YCSB with different KV configurations shows that Cocytus incurs low overhead for latency and throughput, can tolerate node failures with fast online recovery, yet saves 33% to 46% memory compared to PBR when tolerating two failures.

报告人简介:

董明凯(1992--), 男, 河北省唐山市人, 2015年于上海交通大学获本科学位, 目前于上海交通大学并行与分布式系统研究所 (IPADS) 攻读硕士学位, 主要研究领域为分布式系统、文件系统等。


博士生论坛报告2:Access Characteristic Guided Read and Write Cost Regulation for Performance Improvement on Flash Memory

报告摘要:

The relatively high cost of write operations has become the performance bottleneck of flash memory. Write cost refers to the time needed to program a flash page using incremental-step pulse programming (ISPP), while read cost refers to the time needed to sense and transfer a page from the storage. If a flash page is written with a higher cost by using a finer step size during the ISPP process, it can be read with a relatively low cost due to the time saved in sensing and transferring, and vice versa. We introduce AGCR, an access characteristic guided cost regulation scheme that exploits this tradeoff to improve flash performance. Based on workload characteristics, logical pages receiving more reads will be written using a finer step size so that their read cost is reduced. Similarly, logical pages receiving more writes will be written using a coarser step size so that their write cost is reduced. Our evaluation shows that AGCR incurs negligible overhead, while improving performance by 15% on average, compared to previous approaches.

报告人简介:

李乔,重庆大学计算机学院,在读硕士研究生,导师为石亮副教授和诸葛晴凤教授。主持一项重庆市研究生科研创新项目。主要研究方向为闪存存储器性能优化与错误控制,针对该方向,在国际重要学术会议DATE和FAST等发表学术论文,基于对闪存错误的分析,实现闪存访问性能的优化。

博士生论坛报告3:PSLO: Enforcing the Xth Percentile Latency and Throughput SLOs for Consolidated VM Storage

报告摘要:

It is desirable but challenging to simultaneously support latency SLO at a pre-defined percentile, i.e., the Xth percentile latency SLO, and throughput SLO for consolidated VM storage. Ensuring the Xth percentile latency contributes to accurately differentiating service levels in the metric of the application-level latency SLO compliance, especially for the application built on multiple VMs. However, the Xth percentile latency SLO and throughput SLO enforcement are the opposite sides of the same coin due to the conflicting requirements for the level of IO concurrency. To address this challenge, this paper proposes PSLO, a framework supporting the Xth percentile latency and throughput SLOs under consolidated VM environment by precisely coordinating the level of IO concurrency and arrival rate for each VM issue queue. It is noted that PSLO can take full advantage of the available IO capacity allowed by SLO constraints to improve throughput or reduce latency with the best effort. We design and implement a PSLO prototype in the real VM consolidation environment created by Xen. Our extensive trace-driven prototype evaluation shows that our system is able to optimize the Xth percentile latency and throughput for consolidated VMs under SLO constraints.

报告人简介:

李宁, 华中科技大学计算机学院博士生,主要研究方向为虚拟化、服务质量保证和评价、云计算和存储管理。



博士生论坛报告4:ParaFS: A Log-Structured File Systemto Exploit the Internal Parallelism of Flash Devices.

报告摘要:

File system designs are undergoing rapid evolution toexploit the potentials of flash memory. However, theinternal parallelism, a key feature of flash devices, ishard to be leveraged in the file system level, due tothe semantic gap caused by the flash translation layer(FTL).We observe that even flash-optimized file systemshave serious garbage collection problems, which lead tosignificant performance degradation, for write-intensiveworkloads on multi-channel flash devices.We propose ParaFS to exploit the internalparallelism while ensuring efficient garbage collection.ParaFS is a log-structured file system over a simplified block-level FTL that exposes the physical layout.With the knowledge of device information, ParaFS firstproposes 2-D data allocation, to maintain the hot/colddata grouping in flash memory while exploiting channel-level parallelism. ParaFS then coordinates the garbagecollection in both FS and FTL levels, to make garbagecollection more efficient. In addition, ParaFS schedules read/write/erase requests over multiple channelsto achieve consistent performance. Evaluations showthat ParaFS effectively improves system performance forwrite-intensive workloads by 1.6X to 3.1X, compared tothe flash-optimized F2FS file system.

报告人简介:

张佳程,清华大学计算机系四年级博士生,导师是舒继武教授,目前的研究方向包括闪存文件系统设计,非易失存储技术等,其中闪存文件系统方面的工作发表在USENIX ATC2016上。作为骨干参与863课题及国家重大科技专项。

博士生论坛报告5:Bridging the I/O Performance Gap for Big Data Workloads: A New NVDIMM-based Approach

报告摘要:

The long I/O latency posts significant challenges for many data-intensive applications, such as the emerging big data workloads. Recently, the NVDIMM (Non-Volatile Dual Inline Memory Module) technologies provide a promising solution to this problem. By employing non-volatile NAND flash memory as storage media and connecting them via DIMM (Dual In-line Memory Module) slots, the NVDIMM devices are exposed to memory bus so the access latencies due to going through I/O controllers can be significantly mitigated. However, placing NVDIMM on the memory bus introduces new challenges. For instance, by mixing I/O and memory traffic, NVDIMM can cause severe performance degradation on memory-intensive applications. Besides, there exists a speed mismatch between fast memory access and slow flash read/write operations. Moreover, garbage collection (GC) in NAND flash may cause up to several millisecond latency. This work presents novel, enabling mechanisms that allow NVDIMM to more effectively bridge the I/O performance gap for big data workloads. To address the workload heterogeneity challenge, we develop a scheduling scheme in memory controller to minimize the interference between the native and the I/O-derived memory traffic by exploiting both data access criticality and resource utilization. For NVDIMM controller, several mechanisms are designed to better orchestrate traffic between the memory controller and NAND flash to alleviate the speed discrepancy issue. To mitigate the lengthy GC period, we propose a proactive GC scheme for the NVDIMM controller and flash controller to intelligently synchronize and transfer data involving in forthcoming GC operations. We present detailed evaluation and analysis to quantify how well our techniques fit with the NVDIMM design. Our experimental results show that overall the proposed techniques yield 10%~35% performance improvements over the state-of-the-art baseline schemes.

报告人简介:

Renhai Chen received the BE and ME degrees from the Department of Computer Science and Technology, Shandong University, China, in 2009 and 2012, respectively. Since June 2013, he works toward the PhD degree in the Department of Computing at the Hong Kong Polytechnic University wit Prof. Zili Shao. His research interests include storage management, emerging memory technologies, and hardware/software codesign in both embedded and high performance computing systems.

博士生论坛报告6:基于STT-RAM的GPU寄存器文件优化技术

报告摘要:

通用图形处理器(GPU)目前已经成为高性能计算、数据中心、深度机器学习等应用领域的主要硬件加速器。随着硬件规模的不断增大,如何持续提高GPU的能效成为了未来GPU设计的一个关键问题。我们将介绍如何利用新型非易失存储技术(STT-RAM)来优化GPU寄存器文件设计,通过利用轻量级压缩算法来降低STT-RAM的动态功耗消耗;针对STT-RAM存在的读干扰问题,提出利用编译器来降低恢复操作的性能和能量开销。

报告人简介:

张航,国防科学技术大学博士生在读。2011年毕业于国防科学技术大学获得硕士学位,2009年毕业于哈尔滨工业大学获得学士学位。2014-2015在美国加州大学圣塔芭芭拉分校进行联合培养。曾在DAC 2016,DATE 2016等高水平会议发表论文。目前的研究方向是非易失存储技术和GPU体系结构。