'개발새발'에 해당되는 글 56건

Apache Druid 개요

Development/DB 2019. 12. 29. 16:41

출처: Apache Druid 홈페이지

Apache Druid는 아파치 인큐베이팅 프로젝트 중 하나로 홈페이지 내 설명에 따르면 대용량 데이터셋에 대한 신속한 분석이 가능한 데이터베이스라고 한다.

A real-time analytics database designed for fast slice-and-dice analytics ("OLAP" queries) on large data sets

- Clickstream analytics (web and mobile analytics)
- Network telemetry analytics (network performance monitoring)
- Server metrics storage
- Supply chain analytics (manufacturing metrics)
- Application performance metrics
- Digital marketing/advertising analytics
- Business intelligence / OLAP

 

주요 특징으로는 다음과 같다.

  1. Columnar storage format.

  2. Scalable distributed system.

  3. Massively parallel processing.

  4. Realtime or batch ingestion.

  5. Self-healing, self-balancing, easy to operate.

  6. Cloud-native, fault-tolerant architecture that won't lose data.

  7. Indexes for quick filtering.

  8. Time-based partitioning.

  9. Approximate algorithms.

  10. Automatic summarization at ingest time.

 

Druid는 다음과 같은 경우에 이점을 가진다.

  • Insert rates are very high, but updates are less common.

  • Most of your queries are aggregation and reporting queries ("group by" queries). You may also have searching and scanning queries.

  • You are targeting query latencies of 100ms to a few seconds.

  • Your data has a time component (Druid includes optimizations and design choices specifically related to time).

  • You may have more than one table, but each query hits just one big distributed table. Queries may potentially hit more than one smaller "lookup" table.

  • You have high cardinality data columns (e.g. URLs, user IDs) and need fast counting and ranking over them.

  • You want to load data from Kafka, HDFS, flat files, or object storage like Amazon S3.

 

Druid를 사용하면 안되는 경우

  • You need low-latency updates of existing records using a primary key. Druid supports streaming inserts, but not streaming updates (updates are done using background batch jobs).

  • You are building an offline reporting system where query latency is not very important.

  • You want to do "big" joins (joining one big fact table to another big fact table) and you are okay with these queries taking a long time to complete.

출처: Apache Druid 홈페이지

'Development > DB' 카테고리의 다른 글

Apache Druid Ingestion  (0) 2019.12.31
Apache Druid Segment  (0) 2019.12.29
Apache Druid Architecture  (0) 2019.12.29
PostgreSQL에서 CSV 파일 Import하기  (0) 2017.01.05
블로그 이미지

나뷜나뷜

,