A Data Engineers switch from AWS to GCP first thoughts

I’ve always been an AWS kind of guy not by choice, but just because every where I worked they had AWS. Sure, there were few places who had Azure – cue the developers in the room grumbling about Microsoft . Azure was doing something or other, but I just didn’t have any exposure to Azure […]
AWS Made Easy with Boto3

If you’re a starting out as a Data Engineer and using AWS, then life gets a whole lot easier with the use of Boto3, the AWS SDK for Python. Boto3 simplifies integration of your Python applications, libraries, or scripts with AWS services like Amazon S3, EC2, DynamoDB and more. Well, that’s what the documentation says. […]
Comparing Amazon S3 Storage Options: s3n, s3a, and s3

When I’m building pipelines, it is common to access S3 at some point in the process. In some articles and tutorials, S3N or S3A may be mentioned in the connection string for S3. What is the difference? I look into the differences here. Basically In a nutshell, S3N and S3A are storage options provided by […]
Amazon Athena for Apache Spark

Christmas has come early for us and we have the good folks at AWS to thank for it. What is it I hear you say? A new feature that I believe is going to change the way we use Athena going forward. Well, going to change the way I use it going forward that is […]
How to use a Jupyter notebook in AWS Cloud9 IDE

First, a foremost what is AWS Cloud9 IDE? According to the good people at AWS, it is a cloud-based integrated development environment (IDE) that lets you write, run, and debug your code with a browser. In my option, AWS Cloud9 IDE offers a great code-editing experience with support for several programming languages and runtime debuggers, […]
