[1004jonghee]하둡이란?

BigData/Hadoop

[1004jonghee]하둡이란?

Jonghee Jeon 2013. 8. 7. 12:22

하둡이란?

• It is designed to scale up from single servers to thousands of machines, each offering library is a framework that allows for the distributed processing of local computation and storage

- hadoop.aphache.org –

- 하둡은 클러스터 환경에서 대량의 데이터를 분산처리,저장,관리를 지원하는 오픈소스 프레임워크.

- 구글 파일 시스템을 대체할 수 있는 HDFS(Hadoop Distributed File System )와 MapReduce를 구현한 것이다.

하둡의 장단점

Strengths

- 오픈소스로 라이선스에 대한 비용 부담 적음

- 시스템을 중단하지 않더라도 장비의 추가 및 삭제가 용이

- 일부 장비에 장애가 발생하더라도 전체 시스템 사용성에 영향이 적음

- 저렴한 구축 비용과 비용 대비 빠른 데이터 처리

- 데이터의 복제 본을 저장하기 때문에 서버의 장애가 발생했을 때도 데이터의 복구 가능.

Weaknesse

- HDFS에 저장된 데이터는 변경 불가

- 대용량 데이터의 배치 처리에는 적합하나, 스트리밍과 같은 실시간성 데이터 분석이나, 신속성이 보장되어야 하는 작업에는 부적합.

- Hadoop 엔지니어의 부족.

Hadoop core project

•Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

•Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.

•Hadoop YARN: A framework for job scheduling and cluster resource management.

- hadoop.apache.org -

저작자표시 (새창열림)