'Development/DB'에 해당되는 글 5건

2019.12.29 Apache Druid 개요

Apache Druid 개요

Development/DB 2019. 12. 29. 16:41

Apache Druid는 아파치 인큐베이팅 프로젝트 중 하나로 홈페이지 내 설명에 따르면 대용량 데이터셋에 대한 신속한 분석이 가능한 데이터베이스라고 한다.

A real-time analytics database designed for fast slice-and-dice analytics ("OLAP" queries) on large data sets

- Clickstream analytics (web and mobile analytics)
- Network telemetry analytics (network performance monitoring)
- Server metrics storage
- Supply chain analytics (manufacturing metrics)
- Application performance metrics
- Digital marketing/advertising analytics
- Business intelligence / OLAP

주요 특징으로는 다음과 같다.

Columnar storage format.
Scalable distributed system.
Massively parallel processing.
Realtime or batch ingestion.
Self-healing, self-balancing, easy to operate.
Cloud-native, fault-tolerant architecture that won't lose data.
Indexes for quick filtering.
Time-based partitioning.
Approximate algorithms.
Automatic summarization at ingest time.

Druid는 다음과 같은 경우에 이점을 가진다.

Insert rates are very high, but updates are less common.
Most of your queries are aggregation and reporting queries ("group by" queries). You may also have searching and scanning queries.
You are targeting query latencies of 100ms to a few seconds.
Your data has a time component (Druid includes optimizations and design choices specifically related to time).
You may have more than one table, but each query hits just one big distributed table. Queries may potentially hit more than one smaller "lookup" table.
You have high cardinality data columns (e.g. URLs, user IDs) and need fast counting and ranking over them.
You want to load data from Kafka, HDFS, flat files, or object storage like Amazon S3.

Druid를 사용하면 안되는 경우

You need low-latency updates of existing records using a primary key. Druid supports streaming inserts, but not streaming updates (updates are done using background batch jobs).
You are building an offline reporting system where query latency is not very important.
You want to do "big" joins (joining one big fact table to another big fact table) and you are okay with these queries taking a long time to complete.

출처: Apache Druid 홈페이지

저작자표시 비영리 변경금지 (새창열림)

'Development > DB' 카테고리의 다른 글

Apache Druid Ingestion (0)	2019.12.31
Apache Druid Segment (0)	2019.12.29
Apache Druid Architecture (0)	2019.12.29
PostgreSQL에서 CSV 파일 Import하기 (0)	2017.01.05

나뷜나뷜

일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

'Development/DB'에 해당되는 글 5건

Apache Druid 개요

'Development > DB' 카테고리의 다른 글

카테고리

태그목록

글 보관함

달력

나뷜나뷜

LATEST FROM OUR BLOG

LATEST COMMENTS

BLOG VISITORS

티스토리툴바