#data-engineering

[ follow ]
Artificial intelligence
fromInfoQ
3 weeks ago

Mandy Gu on Generative AI (GenAI) Implementation, User Profiles and Adoption of LLMs

Generative AI and large language models are being implemented in real-world projects to enhance organizational capabilities.
fromHackernoon
2 years ago

The HackerNoon Newsletter: A Data Engineers Guide to PyIceberg (7/6/2025) | HackerNoon

The arrival of truly intelligent, always-on, AI-native revenue engines is dismantling the way we've structured go-to-market motions for 20 years.
Tech industry
fromInfoWorld
1 month ago

Databricks targets AI bottlenecks with Lakeflow Designer

Lakeflow and OpenFlow reflect two philosophies: Databricks integrates data engineering into a Spark-native, open orchestration fabric, while Snowflake's OpenFlow offers declarative workflow control.
Software development
fromTheregister
1 month ago

Industry reacts to DuckDB's Lakehouse architecture reorg

Databricks' acquisition of Tabular is revitalizing the table formats landscape, especially with DuckDB's innovative offerings.
fromInfoWorld
1 month ago

Snowflake launches Openflow to tackle AI-era data ingestion challenges

Openflow simplifies data ingestion, transformation, and observability for enterprises engaging with AI use cases.
fromTechzine Global
1 month ago

Fivetran expands Connector SDK for custom data sources

Fivetran's Connector SDK empowers developers to create custom connectors easily, addressing data gaps and enabling centralized data management without the need for extensive DevOps support.
Data science
#e-commerce
fromHackernoon
3 months ago
Data science

Rajesh Sura: Revolutionizing Global Selection Strategy with Data, AI, and Automation | HackerNoon

fromHackernoon
3 months ago
Data science

Rajesh Sura: Revolutionizing Global Selection Strategy with Data, AI, and Automation | HackerNoon

#apache-spark
fromMedium
2 months ago
Data science

Day 6-Sessionization of Web Logs using Time Difference | Apache Spark Interview Problem.

fromMedium
2 months ago
Data science

Understanding the load() Function in Apache Spark: Syntax, Examples, and Best Practices

fromawstip.com
3 months ago
Data science

Spark Scala Exercise 5: Column Operations with DataFramesA Complete Guide for Data Engineers

fromMedium
2 months ago
Data science

Day 6-Sessionization of Web Logs using Time Difference | Apache Spark Interview Problem.

fromMedium
2 months ago
Data science

Understanding the load() Function in Apache Spark: Syntax, Examples, and Best Practices

fromawstip.com
3 months ago
Data science

Spark Scala Exercise 5: Column Operations with DataFramesA Complete Guide for Data Engineers

fromHackernoon
3 months ago

How Bharath Rajasekaran Scaled a Global Data Pipeline in 3 Months | HackerNoon

Annalect's data pipeline migration exemplifies outstanding engineering leadership and has transformed the company's approach to cloud data management.
fromMedium
2 months ago

Day 3-Revenue Aggregation per Region and Category | Spark Interview Problem.

To create a daily aggregated revenue dashboard, we need to sum the total revenue for each product category by region, helping managers make informed business decisions.
Data science
fromHackernoon
5 years ago

Traditional Monitoring Is Dead. Long Live Data Observability | HackerNoon

Traditional monitoring fails to meet the needs of complex data organizations; instead, engineers must develop interactive observability frameworks to quickly identify anomalies.
Data science
fromHackernoon
3 years ago

Building a Real-Time Change Data Capture Pipeline with Debezium, Kafka, and PostgreSQL | HackerNoon

The article provides a step-by-step guide to setting up a Change Data Capture (CDC) pipeline using PostgreSQL, Debezium, Apache Kafka, and Python.
Data science
Scala
fromMedium
2 months ago

Data Quality Verification with Deequ: A Practical Approach Using Scala

Utilizing Deequ and Scala for efficient and automated data validation is highly effective for managing large datasets.
Data science
fromHackernoon
6 months ago

LLMs in Data Engineering: Not Just Hype, Here's What's Real | HackerNoon

Large Language Models are transforming data engineering by enhancing performance and operational efficiencies.
DevOps
fromMedium
3 months ago

Evolvability-It's Mostly About Data Contracts

Data Contracts can mitigate complexity in analytic systems by fostering loose coupling and enhancing adaptability.
fromHackernoon
3 months ago

Tired of Copy-Pasting Hive Output? This PySpark Hack Fixes It | HackerNoon

Automating CSV export from Hive or Impala output is essential for efficient data engineering tasks.
Women in technology
fromBusiness Insider
3 months ago

I became a director at Ford after pivoting careers in the last recession. Here are 3 ways to recession-proof your job.

Continuous learning through online courses is key to job security in recessionary times.
fromChannelPro
3 months ago

Datatonic expands global services with Syntio acquisition

We are thrilled to welcome Syntio to the Datatonic family. This acquisition is a key step in our strategy to expand our global reach and enhance our service capabilities.
Data science
Data science
fromTechzine Global
3 months ago

Datatonic acquires Syntio and strengthens expertise in data engineering

Datatonic's acquisition of Syntio enhances its data consultancy with increased capabilities in data engineering and expanded service offerings.
fromawstip.com
3 months ago

Spark Scala Exercise 23: Working with Delta Lake in Spark ScalaACID, Time Travel, and Upserts

Delta Lake enhances data reliability and governance for data lakes by integrating warehouse features.
frommedium.com
3 months ago

Spark Scala Exercise 10: Handling Nulls and Data CleaningFrom Raw Data to Analytics-Ready

Effective data cleaning is essential in data engineering to prevent downstream issues caused by nulls.
frommedium.com
3 months ago

Spark Scala Exercise 9: Joining Two Datasets in SparkMastering Inner, Left, Right, and Outer

Joining datasets in Spark Scala allows for effective data analysis and relationship understanding.
#spark
Data science
fromMedium
4 months ago

100 Days of Data Engineering on Databricks Day 44: PySpark vs. Scala:

The choice between PySpark and Scala significantly affects performance and maintainability in Spark development.
Artificial intelligence
fromMedium
4 months ago

These AI & Data Engineering Sessions Are a Must-Attend at ODSC East 2025

Organizations are focusing on efficiently and securely integrating advanced AI models at scale.
Practical strategies and real-world insights are essential for navigating AI and data engineering challenges.
Scala
fromMedium
5 months ago

Scala Vs. Python-What Data Engineers Need To Know

Scala improves upon Java while remaining JVM-compatible, making it attractive for organizations.
#data-serving
Business intelligence
fromMedium
6 months ago

Serving Data in the Data Engineering Lifecycle: A Comprehensive Guide

Data engineering culminates in serving data for analytics, ML, and operations.
Data quality and trust are critical in serving data effectively.
fromfaun.pub
6 months ago
Business intelligence

Serving Data in the Data Engineering Lifecycle: A Comprehensive Guide

Data serving is the culmination of data engineering, delivering value to users through analytics and applications.
Business intelligence
fromMedium
6 months ago

Serving Data in the Data Engineering Lifecycle: A Comprehensive Guide

Data engineering culminates in serving data for analytics, ML, and operations.
Data quality and trust are critical in serving data effectively.
Information security
fromMedium
6 months ago

The Future of Data Engineering: Security, Privacy, and the Path Ahead

Security and privacy are essential to data engineering, integral to ethics and resilience amid evolving challenges.
#data-architecture
Data science
fromMedium
6 months ago

Can Your Data Architecture Handle Tomorrow? Building for Flexibility and Lasting Impact

Good data architecture is essential for effective data engineering and organizational competitiveness.
fromMedium
6 months ago
Data science

Can Your Data Architecture Handle Tomorrow? Building for Flexibility and Lasting Impact

Good data architecture is vital for effective data engineering and organizational competitiveness.
Data science
fromMedium
6 months ago

Can Your Data Architecture Handle Tomorrow? Building for Flexibility and Lasting Impact

Good data architecture is essential for effective data engineering and organizational competitiveness.
fromMedium
6 months ago
Data science

Can Your Data Architecture Handle Tomorrow? Building for Flexibility and Lasting Impact

Data science
fromMedium
6 months ago

Understanding Data Generation in Source Systems: How It Works and Real-Time Applications

Data generation is crucial in data engineering lifecycle for reliable processing and transformation.
fromHackernoon
4 years ago

The Two Types of Data Engineers You Meet at Work | HackerNoon

Data engineers are categorized into two archetypes: business-oriented and tech-oriented, each with distinct roles and responsibilities.
fromComputerWeekly.com
5 months ago

A path to better data engineering | Computer Weekly

While conventional ETL data pipelines excel at processing structured data, they falter when confronting the ambiguity and variability of real-world information.
Data science
Artificial intelligence
fromMedium
9 months ago

Networking, Hackathons, Meetups, and Other Extra Events Coming to ODSC West 2024

The conference provides hands-on AI learning and immersive networking opportunities.
Participants can engage in various thematic events including hackathons and summits.
ODSC West fosters connections among AI professionals and enthusiasts.
fromTechzine Global
9 months ago

With Databricks Apps, business users get more out of data

Databricks Apps empower business users by simplifying data access, allowing them to create applications without heavy reliance on data engineering, thus facilitating quick and informed decision-making.
Data science
Data science
fromMedium
10 months ago

The Importance of Data Structures and Algorithms in the Life of a Data Engineer

Mastering Data Structures and Algorithms is crucial for optimizing data engineering tasks.
[ Load more ]