Home eBooks Download › mastering apache iceberg

Mastering Apache Iceberg

Download Mastering Apache Iceberg PDF/ePub or read online books in Mobi eBooks. Click Download or Read Online button to get Mastering Apache Iceberg book now. This website allows unlimited access to, at the time of writing, more than 1.5 million titles, including hundreds of thousands of titles in various foreign languages. If the content not found or just blank you must refresh this page

Mastering Apache Iceberg

DOWNLOAD
Author : Robert Johnson
language : en
Publisher: HiTeX Press
Release Date : 2025-01-05

Mastering Apache Iceberg written by Robert Johnson and has been published by HiTeX Press this book supported file pdf, txt, epub, kindle and other format this book has been release on 2025-01-05 with Computers categories.

"Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake" is an essential guide for data professionals seeking to harness the power of Apache Iceberg in optimizing their data lake strategies. As organizations grapple with ever-growing volumes of structured and unstructured data, the need for efficient, scalable, and reliable data management solutions has never been more critical. Apache Iceberg, an open-source project revered for its robust table format and advanced capabilities, stands out as a formidable tool designed to address the complexities of modern data environments. This comprehensive text delves into the intricacies of Apache Iceberg, offering readers clear guidance on its setup, operation, and optimization. From understanding the foundational architecture of Iceberg tables to implementing effective data partitioning and clustering techniques, the book covers a wide spectrum of key topics necessary for mastering this technology. It provides practical insights into optimizing query performance, ensuring data quality and governance, and integrating with broader big data ecosystems. Rich with case studies, the book illustrates real-world applications across various industries, demonstrating Iceberg's capacity to transform data management approaches and drive decision-making excellence. Designed for data architects, engineers, and IT professionals, "Mastering Apache Iceberg" combines theoretical knowledge with actionable strategies, empowering readers to implement Iceberg effectively within their organizational frameworks. Whether you're new to Apache Iceberg or looking to deepen your expertise, this book serves as a crucial resource for unlocking the full potential of big data management, ensuring that your organization remains at the forefront of innovation and efficiency in the data-driven age.

Mastering Apache Hadoop

DOWNLOAD
Author : Cybellium
language : en
Publisher: Cybellium Ltd
Release Date : 2023-09-26

Mastering Apache Hadoop written by Cybellium and has been published by Cybellium Ltd this book supported file pdf, txt, epub, kindle and other format this book has been release on 2023-09-26 with Computers categories.

Unleash the Power of Big Data Processing with Apache Hadoop Ecosystem Are you ready to embark on a journey into the world of big data processing and analysis using Apache Hadoop? "Mastering Apache Hadoop" is your comprehensive guide to understanding and harnessing the capabilities of Hadoop for processing and managing massive datasets. Whether you're a data engineer seeking to optimize processing pipelines or a business analyst aiming to extract insights from large data, this book equips you with the knowledge and tools to master the art of Hadoop-based data processing. Key Features: 1. Deep Dive into Hadoop Ecosystem: Immerse yourself in the core components and concepts of the Apache Hadoop ecosystem. Understand the architecture, components, and functionalities that make Hadoop a powerful platform for big data. 2. Installation and Configuration: Master the art of installing and configuring Hadoop on various platforms. Learn about cluster setup, resource management, and configuration settings for optimal performance. 3. Hadoop Distributed File System (HDFS): Uncover the power of HDFS for distributed storage and data management. Explore concepts like replication, fault tolerance, and data placement to ensure data durability. 4. MapReduce and Data Processing: Delve into MapReduce, the core data processing paradigm in Hadoop. Learn how to write MapReduce jobs, optimize performance, and leverage parallel processing for efficient data analysis. 5. Data Ingestion and ETL: Discover techniques for ingesting and transforming data in Hadoop. Explore tools like Apache Sqoop and Apache Flume for extracting data from various sources and loading it into Hadoop. 6. Data Querying and Analysis: Master querying and analyzing data using Hadoop. Learn about Hive, Pig, and Spark SQL for querying structured and semi-structured data, and uncover insights that drive informed decisions. 7. Data Storage Formats: Explore data storage formats optimized for Hadoop. Learn about Avro, Parquet, and ORC, and understand how to choose the right format for efficient storage and retrieval. 8. Batch and Stream Processing: Uncover strategies for batch and real-time data processing in Hadoop. Learn how to use Apache Spark and Apache Flink to process data in both batch and streaming modes. 9. Data Visualization and Reporting: Discover techniques for visualizing and reporting on Hadoop data. Explore integration with tools like Apache Zeppelin and Tableau to create compelling visualizations. 10. Real-World Applications: Gain insights into real-world use cases of Apache Hadoop across industries. From financial analysis to social media sentiment analysis, explore how organizations are leveraging Hadoop's capabilities for data-driven innovation. Who This Book Is For: "Mastering Apache Hadoop" is an essential resource for data engineers, analysts, and IT professionals who want to excel in big data processing using Hadoop. Whether you're new to Hadoop or seeking advanced techniques, this book will guide you through the intricacies and empower you to harness the full potential of big data technology.

Mastering Snowflake Platform

DOWNLOAD
Author : Pooja Kelgaonkar
language : en
Publisher: BPB Publications
Release Date : 2024-01-12

Mastering Snowflake Platform written by Pooja Kelgaonkar and has been published by BPB Publications this book supported file pdf, txt, epub, kindle and other format this book has been release on 2024-01-12 with Computers categories.

Embark on the data journey with the ultimate guide to Snowflake mastery KEY FEATURES ● Learn about Snowflake cloud-based data architecture and its basics. ● Learn and implement Snowflake’s unified features with use cases. ● Design and deploy robust enterprise data architectures with Snowflake. DESCRIPTION Handling ever evolving data for business needs can get complex. Traditional methods create bulky and costly-to-maintain data systems. Here, Snowflake emerges as a cost-effective solution, catering to both traditional and modern data needs with zero or minimal maintenance costs. This book helps you grasp Snowflake, guiding you to create complete solutions from start to finish. The starting focus covers Snowflake architecture, key features, native loading and unloading capabilities, ANSI SQL support, and processing of diverse data types and objects. The next part utilizes acquired knowledge to look into implementing data security, governance, and collaborations, utilizing Snowflake's features like data sharing and cloning. The final part explores advanced topics, including streams, tasks, performance optimizations, cost efficiencies, and operationalization with automated monitoring. Real-time use cases and reference architectures are provided to assist readers in implementing data warehouse, data lake, and data mesh solutions with Snowflake. WHAT YOU WILL LEARN ● Introduction to Snowflake and its three-layered architecture. ● Understand Snowflake’s native features. ● Understand the different types of data workloads and their architecture designs. ● Implement query and cost performance optimization using Snowflake native services. ● Introduction to Snowflake’s advanced features like dynamic and event tables. ● Snowflake’s capabilities with extended support to implement large language models. WHO THIS BOOK IS FOR This book is for data practitioners, data engineers, data architects, or every data enthusiast who is keen on learning Snowflake. It does not need any prior experience, however, it is beneficial to have a basic understanding of cloud computing, data concepts and basic programming skills. TABLE OF CONTENTS 1. Getting Started with Snowflake 2. Three Layered Architecture 3. Data Types, Data Objects and SQL Commands 4. Data Loading and Unloading 5. Understanding Streams and Tasks 6. Understanding Snowpark 7. Access Control and Managing Users Roles 8. Data Protection and Recovery 9. Snowflake Performance Optimization 10. Understanding Snowflake Costing and Utilizations 11. Implementing Cost Optimizations 12. Data Sharing 13. Data Cloning 14. Understanding Snowsight 15. Programming Connectors and Drivers 16. Workload Patterns with Snowflake 17. Introduction to Snowflake’s Advance Features

Ultimate Big Data Analytics With Apache Hadoop Master Big Data Analytics With Apache Hadoop Using Apache Spark Hive And Python

DOWNLOAD
Author : Simhadri Govindappa
language : en
Publisher: Orange Education Pvt Limited
Release Date : 2024-09-09

Ultimate Big Data Analytics With Apache Hadoop Master Big Data Analytics With Apache Hadoop Using Apache Spark Hive And Python written by Simhadri Govindappa and has been published by Orange Education Pvt Limited this book supported file pdf, txt, epub, kindle and other format this book has been release on 2024-09-09 with Computers categories.

Master the Hadoop Ecosystem and Build Scalable Analytics Systems Key Features● Explains Hadoop, YARN, MapReduce, and Tez for understanding distributed data processing and resource management. ● Delves into Apache Hive and Apache Spark for their roles in data warehousing, real-time processing, and advanced analytics. ● Provides hands-on guidance for using Python with Hadoop for business intelligence and data analytics. Book Description In a rapidly evolving Big Data job market projected to grow by 28% through 2026 and with salaries reaching up to $150,000 annually—mastering big data analytics with the Hadoop ecosystem is most sought after for career advancement. The Ultimate Big Data Analytics with Apache Hadoop is an indispensable companion offering in-depth knowledge and practical skills needed to excel in today's data-driven landscape. The book begins laying a strong foundation with an overview of data lakes, data warehouses, and related concepts. It then delves into core Hadoop components such as HDFS, YARN, MapReduce, and Apache Tez, offering a blend of theory and practical exercises. You will gain hands-on experience with query engines like Apache Hive and Apache Spark, as well as file and table formats such as ORC, Parquet, Avro, Iceberg, Hudi, and Delta. Detailed instructions on installing and configuring clusters with Docker are included, along with big data visualization and statistical analysis using Python. Given the growing importance of scalable data pipelines, this book equips data engineers, analysts, and big data professionals with practical skills to set up, manage, and optimize data pipelines, and to apply machine learning techniques effectively. Don’t miss out on the opportunity to become a leader in the big data field to unlock the full potential of big data analytics with Hadoop. What you will learn ● Gain expertise in building and managing large-scale data pipelines with Hadoop, YARN, and MapReduce. ● Master real-time analytics and data processing with Apache Spark’s powerful features. ● Develop skills in using Apache Hive for efficient data warehousing and complex queries. ● Integrate Python for advanced data analysis, visualization, and business intelligence in the Hadoop ecosystem. ● Learn to enhance data storage and processing performance using formats like ORC, Parquet, and Delta. ● Acquire hands-on experience in deploying and managing Hadoop clusters with Docker and Kubernetes. ● Build and deploy machine learning models with tools integrated into the Hadoop ecosystem. Table of Contents 1. Introduction to Hadoop and ASF 2. Overview of Big Data Analytics 3. Hadoop and YARN MapReduce and Tez 4. Distributed Query Engines: Apache Hive 5. Distributed Query Engines: Apache Spark 6. File Formats and Table Formats (Apache Ice-berg, Hudi, and Delta) 7. Python and the Hadoop Ecosystem for Big Data Analytics - BI 8. Data Science and Machine Learning with Hadoop Ecosystem 9. Introduction to Cloud Computing and Other Apache Projects Index

Mastering The Modern Data Stack

DOWNLOAD
Author : Nick Jewell, PhD
language : en
Publisher: TinyTechMedia LLC
Release Date : 2023-09-28

Mastering The Modern Data Stack written by Nick Jewell, PhD and has been published by TinyTechMedia LLC this book supported file pdf, txt, epub, kindle and other format this book has been release on 2023-09-28 with Computers categories.

In the age of digital transformation, becoming overwhelmed by the sheer volume of potential data management, analytics, and AI solutions is common. Then it's all too easy to become distracted by glossy vendor marketing, and then chase the latest shiny tool, rather than focusing on building resilient, valuable platforms that will outperform the competition. This book aims to fix a glaring gap for data professionals: a comprehensive guide to the full Modern Data Stack that's rooted in real-world capabilities, not vendor hype. It is full of hard-earned advice on how to get maximum value from your investments through tangible insights, actionable strategies, and proven best practices. It comprehensively explains how the Modern Data Stack is truly utilized by today's data-driven companies. Mastering the Modern Data Stack: An Executive Guide to Unified Business Analytics is crafted for a diverse audience. It's for business and technology leaders who understand the importance and potential value of data, analytics, and AI—but don’t quite see how it all fits together in the big picture. It's for enterprise architects and technology professionals looking for a primer on the data analytics domain, including definitions of essential components and their usage patterns. It's also for individuals early in their data analytics careers who wish to have a practical and jargon-free understanding of how all the gears and pulleys move behind the scenes in a Modern Data Stack to turn data into actual business value. Whether you're starting your data journey with modest resources, or implementing digital transformation in the cloud, you'll find that this isn't just another textbook on data tools or a mere overview of outdated systems. It's a powerful guide to efficient, modern data management and analytics, with a firm focus on emerging technologies such as data science, machine learning, and AI. If you want to gain a competitive advantage in today’s fast-paced digital world, this TinyTechGuide™ is for you. Remember, it’s not the tech that’s tiny, just the book!™

Mastering Apache Flink

DOWNLOAD
Author : Cybellium
language : en
Publisher: Cybellium Ltd
Release Date : 2023-09-26

Mastering Apache Flink written by Cybellium and has been published by Cybellium Ltd this book supported file pdf, txt, epub, kindle and other format this book has been release on 2023-09-26 with Computers categories.

Harness the Power of Stream Processing and Batch Data Analytics Are you ready to dive into the world of stream processing and batch data analytics with Apache Flink? "Mastering Apache Flink" is your comprehensive guide to unlocking the full potential of this cutting-edge framework for real-time data processing. Whether you're a data engineer looking to optimize data flows or a data scientist aiming to derive insights from large datasets, this book equips you with the knowledge and tools to master the art of Flink-based data processing. Key Features: 1. In-Depth Exploration of Apache Flink: Immerse yourself in the core principles of Apache Flink, understanding its architecture, components, and capabilities. Build a solid foundation that empowers you to process data in both real-time and batch modes. 2. Installation and Configuration: Master the art of installing and configuring Apache Flink on various platforms. Learn about cluster setup, resource management, and configuration tuning for optimal performance. 3. Flink Data Streams: Dive into Flink's data stream processing capabilities. Explore event time processing, windowing, and stateful computations for real-time data analysis. 4. Flink Batch Processing: Uncover the power of Flink for batch data analytics. Learn how to process large datasets using Flink's batch processing mode for efficient analysis. 5. Flink SQL: Delve into Flink's SQL and Table API. Discover how to write SQL queries and perform transformations on structured and semi-structured data for intuitive data manipulation. 6. Flink's State Management: Master Flink's state management mechanisms. Learn how to manage application state for fault tolerance and how to work with savepoints and checkpoints. 7. Complex Event Processing with CEP: Explore Flink's complex event processing capabilities. Learn how to detect patterns, anomalies, and trends in data streams for real-time insights. 8. Machine Learning with FlinkML: Embark on a journey into machine learning with FlinkML. Learn how to implement predictive analytics and machine learning algorithms for data-driven models. 9. Flink Ecosystem and Integrations: Navigate Flink's ecosystem of libraries and integrations. From data ingestion with Apache Kafka to collaborative analytics with Zeppelin, explore tools that enhance Flink's functionalities. 10. Real-World Applications: Gain insights into real-world use cases of Apache Flink across industries. From IoT data processing to fraud detection, explore how organizations leverage Flink for real-time insights. Who This Book Is For: "Mastering Apache Flink" is an indispensable resource for data engineers, analysts, and IT professionals who want to excel in stream processing and batch data analytics using Flink. Whether you're new to Flink or seeking advanced techniques, this book will guide you through the intricacies and empower you to harness the full potential of this powerful framework.

Mastering Data Engineering And Analytics With Databricks

DOWNLOAD
Author : Manoj Kumar
language : en
Publisher: Orange Education Pvt Ltd
Release Date : 2024-09-30

Mastering Data Engineering And Analytics With Databricks written by Manoj Kumar and has been published by Orange Education Pvt Ltd this book supported file pdf, txt, epub, kindle and other format this book has been release on 2024-09-30 with Computers categories.

TAGLINE Master Databricks to Transform Data into Strategic Insights for Tomorrow’s Business Challenges KEY FEATURES ● Combines theory with practical steps to master Databricks, Delta Lake, and MLflow. ● Real-world examples from FMCG and CPG sectors demonstrate Databricks in action. ● Covers real-time data processing, ML integration, and CI/CD for scalable pipelines. ● Offers proven strategies to optimize workflows and avoid common pitfalls. DESCRIPTION In today’s data-driven world, mastering data engineering is crucial for driving innovation and delivering real business impact. Databricks is one of the most powerful platforms which unifies data, analytics and AI requirements of numerous organizations worldwide. Mastering Data Engineering and Analytics with Databricks goes beyond the basics, offering a hands-on, practical approach tailored for professionals eager to excel in the evolving landscape of data engineering and analytics. This book uniquely blends foundational knowledge with advanced applications, equipping readers with the expertise to build, optimize, and scale data pipelines that meet real-world business needs. With a focus on actionable learning, it delves into complex workflows, including real-time data processing, advanced optimization with Delta Lake, and seamless ML integration with MLflow—skills critical for today’s data professionals. Drawing from real-world case studies in FMCG and CPG industries, this book not only teaches you how to implement Databricks solutions but also provides strategic insights into tackling industry-specific challenges. From setting up your environment to deploying CI/CD pipelines, you'll gain a competitive edge by mastering techniques that are directly applicable to your organization’s data strategy. By the end, you’ll not just understand Databricks—you’ll command it, positioning yourself as a leader in the data engineering space. WHAT WILL YOU LEARN ● Design and implement scalable, high-performance data pipelines using Databricks for various business use cases. ● Optimize query performance and efficiently manage cloud resources for cost-effective data processing. ● Seamlessly integrate machine learning models into your data engineering workflows for smarter automation. ● Build and deploy real-time data processing solutions for timely and actionable insights. ● Develop reliable and fault-tolerant Delta Lake architectures to support efficient data lakes at scale. WHO IS THIS BOOK FOR? This book is designed for data engineering students, aspiring data engineers, experienced data professionals, cloud data architects, data scientists and analysts looking to expand their skill sets, as well as IT managers seeking to master data engineering and analytics with Databricks. A basic understanding of data engineering concepts, familiarity with data analytics, and some experience with cloud computing or programming languages such as Python or SQL will help readers fully benefit from the book’s content. TABLE OF CONTENTS SECTION 1 1. Introducing Data Engineering with Databricks 2. Setting Up a Databricks Environment for Data Engineering 3. Working with Databricks Utilities and Clusters SECTION 2 4. Extracting and Loading Data Using Databricks 5. Transforming Data with Databricks 6. Handling Streaming Data with Databricks 7. Creating Delta Live Tables 8. Data Partitioning and Shuffling 9. Performance Tuning and Best Practices 10. Workflow Management 11. Databricks SQL Warehouse 12. Data Storage and Unity Catalog 13. Monitoring Databricks Clusters and Jobs 14. Production Deployment Strategies 15. Maintaining Data Pipelines in Production 16. Managing Data Security and Governance 17. Real-World Data Engineering Use Cases with Databricks 18. AI and ML Essentials 19. Integrating Databricks with External Tools Index

Mastering Apache Hbase

DOWNLOAD
Author : Cybellium
language : en
Publisher: Cybellium Ltd
Release Date :

Mastering Apache Hbase written by Cybellium and has been published by Cybellium Ltd this book supported file pdf, txt, epub, kindle and other format this book has been release on with Computers categories.

Unlock the Power of Scalable and Distributed Data Storage with "Mastering Apache HBase" In the rapidly evolving landscape of data management, the ability to efficiently handle massive amounts of data has become an indispensable skill. "Mastering Apache HBase" serves as your definitive guide to mastering one of the most powerful and flexible distributed NoSQL databases – Apache HBase. Whether you're a seasoned data professional or a newcomer to the world of big data, this book equips you with the knowledge and skills needed to harness the full potential of Apache HBase. About the Book: "Mastering Apache HBase" takes you on a comprehensive journey through the intricacies of this robust and versatile NoSQL database. From the fundamentals of installation and configuration to advanced topics such as performance tuning and integration with other Big Data tools, this book covers it all. Each chapter is meticulously crafted to provide a deep understanding of the concepts along with practical, real-world applications. Key Features: · Solid Foundation: Build a strong understanding by exploring the core concepts of Apache HBase, including its architecture, data model, and storage components. · Efficient Data Management: Learn how to create tables, insert and retrieve data, and implement effective data modeling strategies that maximize performance and flexibility. · Scalability and Distribution: Dive into the distributed nature of Apache HBase and discover techniques to scale your cluster horizontally, ensuring seamless growth as your data needs expand. · Advanced Techniques: Master advanced topics such as data versioning, coprocessors, security, and backup and recovery, enabling you to tackle complex scenarios with confidence. · Performance Optimization: Uncover strategies and best practices for optimizing the performance of your Apache HBase cluster, ensuring your applications run smoothly even at scale. · Integration with Ecosystem: Explore how Apache HBase seamlessly integrates with other Big Data tools like Apache Hadoop, Apache Spark, and Apache Hive, opening up possibilities for data analysis and processing. · Real-World Use Cases: Learn through practical examples and use cases from various industries, including social media, e-commerce, finance, and more, to understand how Apache HBase can solve real-world data challenges. · Expert Insights: Benefit from the experience of seasoned professionals who provide insights, tips, and recommendations garnered from their years of working with Apache HBase. Who This Book Is For: "Mastering Apache HBase" is designed for data engineers, database administrators, and anyone involved in managing and analyzing large volumes of data. Whether you're a developer looking to expand your skillset or an experienced professional aiming to deepen your understanding of distributed data storage, this book is your ultimate resource. © 2023 Cybellium Ltd. All rights reserved. www.cybellium.com

Mastering Apache Kafka

DOWNLOAD
Author : Cybellium
language : en
Publisher: Cybellium Ltd
Release Date : 2023-09-26

Mastering Apache Kafka written by Cybellium and has been published by Cybellium Ltd this book supported file pdf, txt, epub, kindle and other format this book has been release on 2023-09-26 with Computers categories.

Unleash the Power of Distributed Streaming Platform for Real-Time Data Are you ready to delve into the realm of distributed streaming and real-time data processing with Apache Kafka? "Mastering Apache Kafka" is your definitive guide to harnessing the full potential of this cutting-edge platform for building scalable, fault-tolerant, and high-performance data pipelines. Whether you're a data engineer looking to optimize data flows or a software architect aiming to build robust event-driven systems, this book equips you with the knowledge and tools to master the art of Kafka-based data streaming. Key Features: 1. Deep Dive into Apache Kafka: Immerse yourself in the core principles of Apache Kafka, comprehending its architecture, components, and dynamic capabilities. Construct a sturdy foundation that empowers you to manage and process real-time data streams with precision. 2. Installation and Configuration: Master the art of installing and configuring Apache Kafka on diverse platforms. Learn about cluster setup, topic creation, and configuration tuning for optimal performance. 3. Publishing and Consuming Data: Uncover the power of Kafka for publishing and consuming data streams. Explore producer and consumer APIs, message serialization, and different messaging patterns for building resilient data pipelines. 4. Data Streams and Processing: Delve into Kafka Streams for real-time data processing. Learn how to perform transformations, aggregations, and enrichments on data streams without the need for external processing engines. 5. Fault Tolerance and Scalability: Master Kafka's inherent fault tolerance and scalability features. Explore replication, partitioning, and high availability mechanisms that ensure data integrity and system reliability. 6. Connectors and Ecosystem: Explore Kafka's rich ecosystem of connectors and integrations. Learn how to connect Kafka with databases, cloud services, and other systems to facilitate seamless data exchange. 7. Security and Authentication: Discover strategies for securing your Kafka cluster. Learn about encryption, access controls, authentication mechanisms, and best practices to safeguard your data streams. 8. Monitoring and Management: Uncover techniques for monitoring and managing Kafka clusters. Explore tools for tracking performance metrics, diagnosing issues, and ensuring optimal system health. 9. Event Sourcing and Stream Processing Architectures: Embark on a journey into event-driven architectures and stream processing. Learn how Kafka can serve as the backbone for building scalable and responsive systems. 10. Real-World Applications: Gain insights into real-world use cases of Apache Kafka across industries. From IoT data integration to real-time analytics, discover how organizations leverage Kafka for innovative data-driven solutions. Who This Book Is For: "Mastering Apache Kafka" is an indispensable resource for data engineers, software architects, and IT professionals poised to excel in the domain of real-time data streaming with Kafka. Whether you're new to Kafka or seeking advanced techniques, this book will guide you through the intricacies and empower you to harness the full potential of this transformative platform.

Mastering Apache Pulsar

DOWNLOAD
Author : Jowanza Joseph
language : en
Publisher: "O'Reilly Media, Inc."
Release Date : 2021-12-06

Mastering Apache Pulsar written by Jowanza Joseph and has been published by "O'Reilly Media, Inc." this book supported file pdf, txt, epub, kindle and other format this book has been release on 2021-12-06 with Computers categories.

Every enterprise application creates data, including log messages, metrics, user activity, and outgoing messages. Learning how to move these items is almost as important as the data itself. If you're an application architect, developer, or production engineer new to Apache Pulsar, this practical guide shows you how to use this open source event streaming platform to handle real-time data feeds. Jowanza Joseph, staff software engineer at Finicity, explains how to deploy production Pulsar clusters, write reliable event streaming applications, and build scalable real-time data pipelines with this platform. Through detailed examples, you'll learn Pulsar's design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the load manager, and the storage layer. This book helps you: Understand how event streaming fits in the big data ecosystem Explore Pulsar producers, consumers, and readers for writing and reading events Build scalable data pipelines by connecting Pulsar with external systems Simplify event-streaming application building with Pulsar Functions Manage Pulsar to perform monitoring, tuning, and maintenance tasks Use Pulsar's operational measurements to secure a production cluster Process event streams using Flink and query event streams using Presto

Mastering Apache Iceberg

Recent Posts