Research: Big Data Systems

Description

One of our major research areas is big data systems for various kinds of data including relations, graphs, and scientific arrays. For relational data, we mainly focus on query optimization technqiues on Hadoop. For graph data, we have proposed a GPU-based graph processing method called GStream, which shows an extremely fast processing speed of 1,400 MTEPS only using a single PC equipped with two GPUs. For scientific data, we are developing a new scalable method for efficient query processing of NASA's Satellite data on a distributed system. In terms of operations, we are doing research on not only database operations, but also data mining operations. Our ultimate goal is developing a big data system that can process complex operations on a mixture of large-scale multi-typed data in a highly optimized way, especially on a cluster of computers or HPC(e.g., supercomputer) equipped with GPUs and high-speed network.

Big Data Systems 

Related Projects

Financial big data processing system using GPU (PI)

Technology Upgrade R&D Program, Ministry of Science, Korea
Apr. 2017 ~ Dec. 2018

High Performance Distributed in-situ analysis based on GPU (PI)

Korea Institute of Science and Technology Information (KISTI), Korea
Mar. 2017 ~ Sept. 2017

High Performance Big Data Analytics Platform Performance Acceleration Technologies Development

Ministry of Science, Korea
June 2015 ~ May 2019

In-situ analysis of big scientific data on distributed systems (PI)

Korea Institute of Science and Technology Information (KISTI), Ministry of Science, Korea
March 2015 ~ Oct. 2015

Effcient Computing Connected Components using SSD for Big Graphs (PI)

National Research Foundation of Korea(NRF), Ministry of Science, Korea
May 2014 ~ Apr. 2017

From Facebook to Brain Networks: Trillion-scale Big Graph Processing Engine (Co-PI)

Samsung Research Funding Center, Samsung Electronics, Korea
June 2014 ~ May 2018

Self-Organized Software-platform(SOS) for welfare devices (sub-PI)

Ministry of Knowledge Economy, Korea
Dec. 2011 ~ Nov. 2016

Related Papers

A Graph-based Database Partitioning Method for Parallel OLAP query Processing

Nam, Y.-M., Kim, M.-S., Han, D.
In Proc. 34th IEEE International Conference on Data Engineering (ICDE), Paris, France, Apr. 2018

SSDMiner: A Scalable and Fast Disk-based Frequent Pattern Miner

Chon, K.-W and Kim, M.-S.
In Proc. Seventh Int'l Conf. on Emerging Databases-Technologies, Applications, and Theory (EDB), Busan, Korea, Aug. 7-9, 2017. (Best Paper Runner-up Award).

TrillionG: A Trillion-scale Synthetic Graph Generator using a Recursive Vector Model

Park, H. and Kim, M.-S.
In Proc. 2017 ACM SIGMOD, Chicago, USA, May 14-19, 2017 (top conference in database area).

A Distributed In-situ Analysis Method for Large-scale Scientific Data

Han, D., Nam, Y., and Kim, M.-S.
In Proc. IEEE International Conference on Big Data and Smart Computing(BigComp) 2017, Jeju, Korea, Feb. 15, 2017.

GTS: A Fast and Scalable Graph Processing Method based on Streaming Topology to GPUs (slides)

Kim, M.-S., An, K.-H., Park, H., Seo, H., and Kim, J.
In Proc. 2016 ACM SIGMOD, San Francisco, USA, June 28, 2016 (top conference in database area).

DSP-CC: I/O Efficient Parallel Computation of Connected Components in Billion-scale Networks

Kim, M.-S., Lee, S., Han, W.-S., Park, H., and Lee, J.-H.
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2015 (accepted) (ISSN: 1041-4347, SCI, top-20%).

GStream: A Graph Streaming Processing Method for Large-Scale Graphs on GPUs

Seo, H., Kim, J., and Kim, M.-S.
In Proc. 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Feb. 2015 (top conference in parallel programming area).

Enhanced chained and Cuckoo hashing methods for Multi-core CPUs

Kim, E. and Kim, M.-S.
Cluster Computing, Vol. 17, pp. 665–680, Jan. 2014 (ISSN: 1386-7857, SCIE).

Towards Exploiting GPUs for Fast PageRank Computation of Large-Scale Networks

Kim, M.-S.
In Proc. fifth Int'l Conf. on Emerging Databases-Technologies,Applications, and Theory (EDB), Jeju, Korea, Aug. 2013 (Best Paper Runner-up Award).

MapReduce Framework for a Single Computer with Multi-core CPUs and Many-core GPUs

Song, H. and Kim, M.-S.
In Proc. fifth Int'l Conf. on Emerging Databases-Technologies,Applications, and Theory (EDB), Jeju, Korea, Aug. 2013 (invited Paper).

TurboGraph: A Fast Parallel Graph Engine Handling Billion-scale Graphs in a Single PC

Han, W.-S., Lee, S., Park, K., Lee, J.-H., Kim, M.-S., Kim, J., Yu, H.
In Proc. 19th ACM SIGKDD Conference on Knowledge, Discovery, and Data Mining (KDD), Chicago, USA, 2013 (top conference in data mining area, oral presentation).

Patents

낮은 데이터 중복으로 빠른 쿼리 처리를 지원하는 관계형 데이터베이스 저장 시스템, 저장 방법 및 관계형 데이터베이스 저장 방법에 기초한 쿼리를 처리하는 방법

Kim, M.-S, Nam, Y., Han, D.
Korean Patent, Appl. No. 10-2016-0178421, Dec. 23, 2016.

분산 파일 공유 시스템 및 분산 파일 공유 시스템에서 파일을 복제하는 방법

Kim, M.-S, Kim, S.
Korean Patent, Appl. No. 10-2016-0156785, Nov. 23, 2016.

A relational database storage system for high query performance with low data redundancy

Kim, M.-S, Nam, Y., Han, D.
Korean Patent, Appl. No. 10-2016-0112065, Aug. 31, 2016.

System and method for processing large-scale graphs using GPUs and secondary storage

Kim, M.-S., An, K., Park, H., Oh, S., Kim, J.
Korean Patent, Appl. No. 2016.

Method for processing connected components graph interrogation based on disk

Kim, M.-S., Park, H.
Korean Patent, Appl. Np. 10-2015-0050350, Apr. 9, 2015.

System and method for processing large-scale graphs using GPUs

Kim, M.-S., Seo H., and Kim J.
U.S. Patent Appl. No. US 14/658,325, March 12, 2015. Korean Patent Appl. No. 10-2014-0148566, Oct. 29, 2014, Reg. No. 10-1620602, May 3, 2016.

Press Reports

DGIST, 초대규모 그래프 데이터 합성 원천기술 개발

전자신문 (그 외 연합뉴스, 서울경제 등 언론보도), May 31, 2017.

인간 뇌 400분의 1 크기 신경망 데이터 처리

전자신문 (그 외 36개사 언론보도), July 8, 2016.

빅데이터 분석 및 처리 속도 높였다

인터뷰, 김민수 DGIST 정보통신융합공학전공 교수
사이언스타임즈(The Science Times), 한국과학창의재단(KOFAC), April 3, 2015.

김민수 DGIST 교수팀, 빅데이터 처리 속도 300배 이상 빠른 기술 개발

전자신문 (그 외 디지털타임스, 연합뉴스, 머니투데이 등 11개 언론사), Feb. 11, 2015.