Lenovo Big Data Validated Design for Real-time Streaming Analytics with Cloudera Enterprise on ThinkSystem ServersReference Architecture
This document describes a real-time streaming reference architecture for Cloudera Enterprise on Lenovo ThinkSystem servers with locally attached storage. It provides a predefined and optimized hardware infrastructure for the Cloudera Enterprise, a distribution of Apache Hadoop and Apache Spark with enterprise-ready capabilities from Cloudera.
The reference architecture provides the planning, design considerations, and best practices for implementing a real-time streaming Cloudera Enterprise solution with Lenovo products. It enables enterprises and SMBs to reduce business latency and response issues by deploying a big data platform that provides real-time streaming analytics to quickly identify threats and opportunities and consequently reduce response latency.
This architecture utilizes Lenovo ThinkSystem SR650 for Spark streaming, machine learning, Kafka processing, Hbase storage and Elasticsearch search engine. Cloudera management nodes are deployed on the Lenovo ThinkSystem SR630 server. Storage for Kafka transactional logging is deployed on 4-TB Intel Solid State Drive Data Center P4500 Series NVMe storage for high throughput and bandwidth. Elasticsearch search data is stored on low-latency Intel Optane SSD DC P4800X Series.
The intended audiences for this reference architecture include IT professionals, technical architects, sales engineers, and consultants to assist in planning, designing, and implementing the big data solution with Lenovo hardware.
Table of Contents
2 Business problem and business value
4 Architectural overview
5 Component model
6 Operational model
7 Deployment considerations
8 Bill of Materials
9 For more information
Click the Download PDF button to view the document.
Changes in the September 23 update:
- Fixed typographical errors
Related product families
Product families related to this document are the following: