Research: Big Data Systems
Description
One of our major research areas is big data systems for various kinds of data including relations, graphs, and scientific arrays. For relational data, we mainly focus on query optimization technqiues on Hadoop. For graph data, we have proposed a GPU-based graph processing method called GStream, which shows an extremely fast processing speed of 1,400 MTEPS only using a single PC equipped with two GPUs. For scientific data, we are developing a new scalable method for efficient query processing of NASA's Satellite data on a distributed system. In terms of operations, we are doing research on not only database operations, but also data mining operations. Our ultimate goal is developing a big data system that can process complex operations on a mixture of large-scale multi-typed data in a highly optimized way, especially on a cluster of computers or HPC(e.g., supercomputer) equipped with GPUs and high-speed network.
Related Projects
- Financial big data processing system using GPU (PI)
Technology Upgrade R&D Program, Ministry of Science, Korea
Apr. 2017 ~ Dec. 2018
- High Performance Distributed in-situ analysis based on GPU (PI)
Korea Institute of Science and Technology Information (KISTI), Korea
Mar. 2017 ~ Sept. 2017
- High Performance Big Data Analytics Platform Performance Acceleration Technologies Development
Ministry of Science, Korea
June 2015 ~ May 2019
- In-situ analysis of big scientific data on distributed systems (PI)
Korea Institute of Science and Technology Information (KISTI), Ministry of Science, Korea
March 2015 ~ Oct. 2015
- Effcient Computing Connected Components using SSD for Big Graphs (PI)
National Research Foundation of Korea(NRF), Ministry of Science, Korea
May 2014 ~ Apr. 2017
- From Facebook to Brain Networks: Trillion-scale Big Graph Processing Engine (Co-PI)
Samsung Research Funding Center, Samsung Electronics, Korea
June 2014 ~ May 2018
- Self-Organized Software-platform(SOS) for welfare devices (sub-PI)
Ministry of Knowledge Economy, Korea
Dec. 2011 ~ Nov. 2016
Related Papers
- A Graph-based Database Partitioning Method for Parallel OLAP query Processing
Nam, Y.-M., Kim, M.-S., Han, D.
In Proc. 34th IEEE International Conference on Data Engineering (ICDE), Paris, France, Apr. 2018
- SSDMiner: A Scalable and Fast Disk-based Frequent Pattern Miner
Chon, K.-W and Kim, M.-S.
In Proc. Seventh Int'l Conf. on Emerging Databases-Technologies, Applications, and Theory (EDB), Busan, Korea, Aug. 7-9, 2017. (Best Paper Runner-up Award).
- TrillionG: A Trillion-scale Synthetic Graph Generator using a Recursive Vector Model
Park, H. and Kim, M.-S.
In Proc. 2017 ACM SIGMOD, Chicago, USA, May 14-19, 2017 (top conference in database area).
- A Distributed In-situ Analysis Method for Large-scale Scientific Data
Han, D., Nam, Y., and Kim, M.-S.
In Proc. IEEE International Conference on Big Data and Smart Computing(BigComp) 2017, Jeju, Korea, Feb. 15, 2017.
- GTS: A Fast and Scalable Graph Processing Method based on Streaming Topology to GPUs (slides)
Kim, M.-S., An, K.-H., Park, H., Seo, H., and Kim, J.
In Proc. 2016 ACM SIGMOD, San Francisco, USA, June 28, 2016 (top conference in database area).
- DSP-CC: I/O Efficient Parallel Computation of Connected Components in Billion-scale Networks
Kim, M.-S., Lee, S., Han, W.-S., Park, H., and Lee, J.-H.
IEEE Transactions on Knowledge and Data Engineering (TKDE), 2015 (accepted) (ISSN: 1041-4347, SCI, top-20%).
- GStream: A Graph Streaming Processing Method for Large-Scale Graphs on GPUs
Seo, H., Kim, J., and Kim, M.-S.
In Proc. 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Feb. 2015 (top conference in parallel programming area).
- Enhanced chained and Cuckoo hashing methods for Multi-core CPUs
Kim, E. and Kim, M.-S.
Cluster Computing, Vol. 17, pp. 665–680, Jan. 2014 (ISSN: 1386-7857, SCIE).
- Towards Exploiting GPUs for Fast PageRank Computation of Large-Scale Networks
Kim, M.-S.
In Proc. fifth Int'l Conf. on Emerging Databases-Technologies,Applications, and Theory (EDB), Jeju, Korea, Aug. 2013 (Best Paper Runner-up Award).
- MapReduce Framework for a Single Computer with Multi-core CPUs and Many-core GPUs
Song, H. and Kim, M.-S.
In Proc. fifth Int'l Conf. on Emerging Databases-Technologies,Applications, and Theory (EDB), Jeju, Korea, Aug. 2013 (invited Paper).
- TurboGraph: A Fast Parallel Graph Engine Handling Billion-scale Graphs in a Single PC
Han, W.-S., Lee, S., Park, K., Lee, J.-H., Kim, M.-S., Kim, J., Yu, H.
In Proc. 19th ACM SIGKDD Conference on Knowledge, Discovery, and Data Mining (KDD), Chicago, USA, 2013 (top conference in data mining area, oral presentation).
Patents
- 낮은 데이터 중복으로 빠른 쿼리 처리를 지원하는 관계형 데이터베이스 저장 시스템, 저장 방법 및 관계형 데이터베이스 저장 방법에 기초한 쿼리를 처리하는 방법
Kim, M.-S, Nam, Y., Han, D.
Korean Patent, Appl. No. 10-2016-0178421, Dec. 23, 2016.
- 분산 파일 공유 시스템 및 분산 파일 공유 시스템에서 파일을 복제하는 방법
Kim, M.-S, Kim, S.
Korean Patent, Appl. No. 10-2016-0156785, Nov. 23, 2016.
- A relational database storage system for high query performance with low data redundancy
Kim, M.-S, Nam, Y., Han, D.
Korean Patent, Appl. No. 10-2016-0112065, Aug. 31, 2016.
- System and method for processing large-scale graphs using GPUs and secondary storage
Kim, M.-S., An, K., Park, H., Oh, S., Kim, J.
Korean Patent, Appl. No. 2016.
- Method for processing connected components graph interrogation based on disk
Kim, M.-S., Park, H.
Korean Patent, Appl. Np. 10-2015-0050350, Apr. 9, 2015.
- System and method for processing large-scale graphs using GPUs
Kim, M.-S., Seo H., and Kim J.
U.S. Patent Appl. No. US 14/658,325, March 12, 2015.
Korean Patent Appl. No. 10-2014-0148566, Oct. 29, 2014, Reg. No. 10-1620602, May 3, 2016.
Press Reports
- DGIST, 초대규모 그래프 데이터 합성 원천기술 개발
전자신문 (그 외 연합뉴스, 서울경제 등 언론보도), May 31, 2017.
- 인간 뇌 400분의 1 크기 신경망 데이터 처리
전자신문 (그 외 36개사 언론보도), July 8, 2016.
- 빅데이터 분석 및 처리 속도 높였다
인터뷰, 김민수 DGIST 정보통신융합공학전공 교수
사이언스타임즈(The Science Times), 한국과학창의재단(KOFAC), April 3, 2015.
- 김민수 DGIST 교수팀, 빅데이터 처리 속도 300배 이상 빠른 기술 개발
전자신문 (그 외 디지털타임스, 연합뉴스, 머니투데이 등 11개 언론사), Feb. 11, 2015.
|