Green Data Pipelines: AI-Driven Energy Optimization for Sustainable Cloud Workloads

Session

Computer Science and Communication Engineering

Description

The rapid proliferation of cloud computing workloads has intensified concerns about the high energy demand and environmental impact of large-scale data processing. As infrastructures expand, data centers now account for a growing share of global electricity consumption and carbon emissions. This study presents a green data-pipeline architecture that integrates AI-driven scheduling with advanced resource-management techniques to reduce energy use in cloud-native environments. The architecture combines modern data-engineering platforms, including Apache Airflow for workflow orchestration, Databricks (Spark) for computation, PostgreSQL as an analytical warehouse, and dbt for data transformation. It is coordinated by a reinforcement-learning agent that dynamically optimizes workload placement and resource allocation. Using real-time monitoring and predictive modeling, the AI scheduler aligns task execution with renewable-energy availability and workload fluctuations. Simulation results show meaningful reductions in energy consumption and carbon emissions compared with conventional static scheduling, while maintaining performance and operational stability. Although further research is required to validate scalability and generalizability across heterogeneous cloud settings, the proposed framework demonstrates strong potential to enhance the sustainability of data pipelines and promote environmentally responsible computing practices across the industry.

Keywords:

Sustainable computing, green data pipelines, cloud energy efficiency, reinforcement learning, dynamic scheduling

Proceedings Editor

Edmond Hajrizi

ISBN

978-9951-982-41-2

Location

UBT Lipjan, Kosovo

Start Date

25-10-2025 9:00 AM

End Date

26-10-2025 6:00 PM

DOI

10.33107/ubt-ic.2025.94

This document is currently not available here.

Share

COinS
 
Oct 25th, 9:00 AM Oct 26th, 6:00 PM

Green Data Pipelines: AI-Driven Energy Optimization for Sustainable Cloud Workloads

UBT Lipjan, Kosovo

The rapid proliferation of cloud computing workloads has intensified concerns about the high energy demand and environmental impact of large-scale data processing. As infrastructures expand, data centers now account for a growing share of global electricity consumption and carbon emissions. This study presents a green data-pipeline architecture that integrates AI-driven scheduling with advanced resource-management techniques to reduce energy use in cloud-native environments. The architecture combines modern data-engineering platforms, including Apache Airflow for workflow orchestration, Databricks (Spark) for computation, PostgreSQL as an analytical warehouse, and dbt for data transformation. It is coordinated by a reinforcement-learning agent that dynamically optimizes workload placement and resource allocation. Using real-time monitoring and predictive modeling, the AI scheduler aligns task execution with renewable-energy availability and workload fluctuations. Simulation results show meaningful reductions in energy consumption and carbon emissions compared with conventional static scheduling, while maintaining performance and operational stability. Although further research is required to validate scalability and generalizability across heterogeneous cloud settings, the proposed framework demonstrates strong potential to enhance the sustainability of data pipelines and promote environmentally responsible computing practices across the industry.