Apache Druid는 아파치 인큐베이팅 프로젝트 중 하나로 홈페이지 내 설명에 따르면 대용량 데이터셋에 대한 신속한 분석이 가능한 데이터베이스라고 한다.
A real-time analytics database designed for fast slice-and-dice analytics ("OLAP" queries) on large data sets
- Clickstream analytics (web and mobile analytics)
- Network telemetry analytics (network performance monitoring)
- Server metrics storage
- Supply chain analytics (manufacturing metrics)
- Application performance metrics
- Digital marketing/advertising analytics
- Business intelligence / OLAP
주요 특징으로는 다음과 같다.
-
Columnar storage format.
-
Scalable distributed system.
-
Massively parallel processing.
-
Realtime or batch ingestion.
-
Self-healing, self-balancing, easy to operate.
-
Cloud-native, fault-tolerant architecture that won't lose data.
-
Indexes for quick filtering.
-
Time-based partitioning.
-
Approximate algorithms.
-
Automatic summarization at ingest time.
Druid는 다음과 같은 경우에 이점을 가진다.
-
Insert rates are very high, but updates are less common.
-
Most of your queries are aggregation and reporting queries ("group by" queries). You may also have searching and scanning queries.
-
You are targeting query latencies of 100ms to a few seconds.
-
Your data has a time component (Druid includes optimizations and design choices specifically related to time).
-
You may have more than one table, but each query hits just one big distributed table. Queries may potentially hit more than one smaller "lookup" table.
-
You have high cardinality data columns (e.g. URLs, user IDs) and need fast counting and ranking over them.
-
You want to load data from Kafka, HDFS, flat files, or object storage like Amazon S3.
Druid를 사용하면 안되는 경우
-
You need low-latency updates of existing records using a primary key. Druid supports streaming inserts, but not streaming updates (updates are done using background batch jobs).
-
You are building an offline reporting system where query latency is not very important.
-
You want to do "big" joins (joining one big fact table to another big fact table) and you are okay with these queries taking a long time to complete.
'Development > DB' 카테고리의 다른 글
Apache Druid Ingestion (0) | 2019.12.31 |
---|---|
Apache Druid Segment (0) | 2019.12.29 |
Apache Druid Architecture (0) | 2019.12.29 |
PostgreSQL에서 CSV 파일 Import하기 (0) | 2017.01.05 |