When I’m building pipelines, it is common to access S3 at some point in the process. In some articles and tutorials, S3N or S3A may be mentioned in the connection string for S3. What is the difference? I look into the differences here.
Basically
In a nutshell, S3N and S3A are storage options provided by Amazon Simple Storage Service (S3) that differ in the way they store data and their size and performance capabilities.
# s3a example
data = sc.textFile("s3a://bucket-name/key")
# s3n example
data = sc.textFile("s3n://bucket-name/key")
S3N vs S3A
We all know Amazon Simple Storage Service (S3) is a cloud storage service that provides object-based storage. S3 is a block-based overlay on top of Amazon S3, which means that it stores data in blocks, similar to how a traditional hard drive works.
S3N and S3A are two other storage options provided by Amazon S3. Both of these options are object-based, meaning they store data as individual objects rather than in blocks. S3N is a native file system for reading and writing regular files on S3, and it supports objects up to 5GB in size. S3A is the successor to S3N and it has higher performance and supports objects within 5TB.
The main difference between S3 and S3N/S3A is the way they store data. S3 stores data in blocks, while S3N/S3A store data as individual objects.
Further reading
Thanks for reading.