Forget the Cloud - Building Lean Batch Pipelines from TCP Streams with Python and DuckDB

Forget the Cloud - Building Lean Batch Pipelines from TCP Streams with Python and DuckDB

I held this talk at PyCon & PyData DE 2025.

  1. Conference
  2. Video & Material
  3. Description

Conference

  • PyData Berlin 2025

Video & Material

Description

Many industrial and legacy systems still push critical data over TCP streams. Instead of reaching for heavyweight cloud platforms, you can build fast, lean batch pipelines on-prem using Python and DuckDB.

In this talk, you’ll learn how to turn raw TCP streams into structured data sets, ready for analysis, all running on-premise. We’ll cover key patterns for batch processing, practical architecture examples, and real-world lessons from industrial projects.

If you work with sensor data, logs, or telemetry, and you value simplicity, speed, and control this talk is for you.