PySpark-Boilerplate

A boilerplate for writing PySpark Jobs

pythonboilerplateapache-sparkpyspark

Preview

Overview

PySpark-Boilerplate is a template for building production-ready PySpark jobs with structured code organization and best practices. It provides a foundation for data processing workflows, including configuration management, logging and testing patterns, making it suitable for teams building scalable batch processing and ETL pipelines on Apache Spark.

Features

pyspark-jobsproduction-grade-setupbest-practices

Feature Flags

blogjobsQueue

Recommended Use Cases

data-processingbig-data-analyticsspark-jobs

Frontend

None

Backend

apache-sparkpyspark

Auth Providers

None

Deployment Targets

None

Payment Providers

None

Quick Facts

⭐ Stars

394

🍴 Forks

154

🔄 Active

Unknown

🕒 Last Commit

2024-01-21T06:57:52.000Z

GitHub

@ekampf·PySpark-Boilerplate

Stack

Framework

apache-spark

Language

python

Data Layer

UI Stack

Developer Experience

Docker

Tests

Quickstart

env.example

Pricing

Classification

free

Selected

—

Notes

No clear pricing signals

Get Started with this Boilerplate