Review of Data Engineer with Python – Datacamp Course

I recently wrapped up the Data Engineer with Python course on Datacamp and thought I would write up my thoughts on the course and if it was worth the massive amount of time I invested in it. The course promises a lot! –“Start your journey to becoming a data engineer and gain the in-demand data […]

I asked ChatGPT what Data Engineers do. Here’s what it said.

My problem with ChatGPT is it never says “Perhaps”, “I think”, “I could be wrong” or “Maybe”. It speaks with the same level of absolute confidence whether it’s right or wrong. It just confidently presents it’s answers with a level of surety and authority leaving it up to you to weed out the garbage. AI […]

Four types of databases out there

title picture for blog post four types of databases

Database have been around almost almost as long as computer have been around. There are many flavours of databases out in wild, and as a Data Engineer you will run in to some of them if not all of them throughout your career. In this post I will look at the 4 most common types […]

Query S3 using S3 Select and SQL

S3 Select is a highly valuable and in my option one of the most underappreciated features within AWS S3. As a Data Engineer, it is a must-have in your toolkit. What is S3 Select? A feature within S3 that allows you the Data Engineer to run simple SQL queries on objects in S3 buckets. For […]

Upstream or downstream the battle of task dependencies

Task dependencies are useful and a popular feature in Airflow. Simply put they define an order of task execution. Basically which tasks to run and in what order. While it’s not required, task dependencies are normally always set. What if a task dependence is not defined? Well then Airflow takes matters into it’s own hands […]

Which is Right for You: A Database, Data Lake, or Data Warehouse?

As a data engineer, it’s your responsibility to handle and process data effectively. The type of data you encounter in your work can vary, but you will no doubt encounter databases, data lakes, and data warehouses at some stage along your journey as a Data Engineer. In this blog post, I briefly highlight the differences […]

Getting Started with Python Exception Handling

The errors that occur during the execution of a Python program are called exceptions. Examples of exceptions include dividing by zero, combining objects of incompatible types, and many others. Some exceptions have specific names, such as ZeroDivisionError and TypeError. If exceptions are not handled properly, they can halt the entire execution of the program. This […]

The 5 Verbs of REST APIs: A Beginner’s Guide

The data pipelines you build as a Data engineer will move data from one location to another, often from various sources such as databases and APIs to places like data warehouses or data lakes. You will no doubt have to deal with REST API’s at some stage in let’s have a look at what REST […]

The Three Vs of Big Data: A Beginner’s Guide

The three Vs of big data refer to the three characteristics that make managing and analysing large datasets particularly challenging. The three Vs are: Volume The sheer amount of data that needs to be processed and analysed. Big data sets are often too large to be stored and processed on a single computer, and may […]