Research: Big Data Systems


One of our major research areas is big data systems for various kinds of data including relations, graphs, and scientific arrays. For relational data, we mainly focus on query optimization technqiues on Hadoop. For graph data, we have proposed a GPU-based graph processing method called GStream, which shows an extremely fast processing speed of 1,400 MTEPS only using a single PC equipped with two GPUs. For scientific data, we are developing a new scalable method for efficient query processing of NASA's Satellite data on a distributed system. In terms of operations, we are doing research on not only database operations, but also data mining operations. Our ultimate goal is developing a big data system that can process complex operations on a mixture of large-scale multi-typed data in a highly optimized way, especially on a cluster of computers or HPC(e.g., supercomputer) equipped with GPUs and high-speed network.

Big Data Systems 

Related Projects

In-situ analysis of big scientific data on distributed systems (PI)

Korea Institute of Science and Technology Information (KISTI), Ministry of Science, Korea
March 2015 ~ Oct. 2015

Effcient Computing Connected Components using SSD for Big Graphs (PI)

National Research Foundation of Korea(NRF), Ministry of Science, Korea
May 2014 ~ Apr. 2017

From Facebook to Brain Networks: Trillion-scale Big Graph Processing Engine (Co-PI)

Samsung Research Funding Center, Samsung Electronics, Korea
June 2014 ~ May 2018

Self-Organized Software-platform(SOS) for welfare devices (sub-PI)

Ministry of Knowledge Economy, Korea
Dec. 2011 ~ Nov. 2016

Related Papers

DSP-CC: I/O Efficient Parallel Computation of Connected Components in Billion-scale Networks

Kim, M.-S., Lee, S., Han, W.-S., Park, H., and Lee, J.-H.
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2015 (accepted) (ISSN: 1041-4347, SCI, top-20%).

GStream: A Graph Streaming Processing Method for Large-Scale Graphs on GPUs

Seo, H., Kim, J., and Kim, M.-S.
In Proc. 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Feb. 2015 (top conference in parallel programming area).

Enhanced chained and Cuckoo hashing methods for Multi-core CPUs

Kim, E. and Kim, M.-S.
Cluster Computing, Vol. 17, pp. 665–680, Jan. 2014 (ISSN: 1386-7857, SCIE).

Towards Exploiting GPUs for Fast PageRank Computation of Large-Scale Networks

Kim, M.-S.
In Proc. fifth Int'l Conf. on Emerging Databases-Technologies,Applications, and Theory (EDB), Jeju, Korea, Aug. 2013 (Best Paper Runner-up Award).

MapReduce Framework for a Single Computer with Multi-core CPUs and Many-core GPUs

Song, H. and Kim, M.-S.
In Proc. fifth Int'l Conf. on Emerging Databases-Technologies,Applications, and Theory (EDB), Jeju, Korea, Aug. 2013 (invited Paper).

TurboGraph: A Fast Parallel Graph Engine Handling Billion-scale Graphs in a Single PC

Han, W.-S., Lee, S., Park, K., Lee, J.-H., Kim, M.-S., Kim, J., Yu, H.
In Proc. 19th ACM SIGKDD Conference on Knowledge, Discovery, and Data Mining (KDD), Chicago, USA, 2013 (top conference in data mining area, oral presentation).



Kim, M.-S., Seo H., and Kim J.
U.S. Patent Appl. No. US 14/658,325, March 12, 2015; Korean Patent Appl. No. 10-2014-0148566, Oct. 29, 2014.

Press Reports

빅데이터 분석 및 처리 속도 높였다

인터뷰 김민수 DGIST 정보통신융합공학전공 교수
사이언스타임즈(The Science Times), 한국과학창의재단(KOFAC), April 3, 2015.

김민수 DGIST 교수팀, 빅데이터 처리 속도 300배 이상 빠른 기술 개발

전자신문 (그 외 디지털타임스, 연합뉴스, 머니투데이 등 11개 언론사), Feb. 11, 2015.