Why Every Data Engineer Needs Jupyter Notebooks in Their Toolkit

Jupyter Notebooks are an important part of my daily routine, whether for work or personal projects. They have been incredibly useful to me and I can’t imagine being a Data Engineer without them. Best thing is, it’s easy to get started with it. You can use it by downloading JupyterLab or accessing it through your […]
The Key Files for a Smooth-Running Python Project

The three key files in a Python project are the Docker file, the Makefile, and the requirements.txt. Makefile The Makefile is particularly important because it allows you to automate various steps in your project, such as installation, deployment, and linting. Essentially, the Makefile acts like a set of recipes that help you streamline your work […]
Amazon Athena for Apache Spark

Christmas has come early for us and we have the good folks at AWS to thank for it. What is it I hear you say? A new feature that I believe is going to change the way we use Athena going forward. Well, going to change the way I use it going forward that is […]
How to use a Jupyter notebook in AWS Cloud9 IDE

First, a foremost what is AWS Cloud9 IDE? According to the good people at AWS, it is a cloud-based integrated development environment (IDE) that lets you write, run, and debug your code with a browser. In my option, AWS Cloud9 IDE offers a great code-editing experience with support for several programming languages and runtime debuggers, […]
Spark’s map() vs flatMap() What’s the difference?

I’ve been messing around with Spark for a few months and dabbled in it with a few work projects, but I recently decided to really get stuck in and understand it. While testing out Spark’s map() and flatMap() transformation operations, I thought I’d post some of my findings here to save myself having to look […]
