Part 2 : Object PUT - Standard upload

Anirudha | Sun, 01/12/2020 - 20:02

With PUT Object API , you can upload data in single stream. Few things to keep in mind about PUT object or standard upload :

Standard upload :

  • Data uploaded in single PUT operation is standard upload. It can be from 0bytes to 5GB in size.
  • You should have write permission on the bucket.
  • Once file is uploaded to Objects cluster, your data is always immutable , which means, once data is uploaded it can never be updated partially. Any delta changes on the object  will overwrite the entire object.

Let's take a quick look at the python code below .

import boto3
session = boto3.session.Session()

endpoint_url ="http://objects007.scalcia.com"
access_key = "jOMpFFvMMw7un0UxBXRP2EhcVjkGza8n"
secret_key = "pzdBtAhr6Ak_o7IkfSavlxfhMSVMUUaL"

s3c = session.client(aws_access_key_id=access_key,
                             aws_secret_access_key=secret_key,
                             endpoint_url=endpoint_url,
                             service_name="s3")

#File to upload to Objects endpoint
filename="/tmp/employee_stats.txt"

#Declaring bucket name.
bucket = "testbucket"

#"/tmp/employee_stats.txt" will be uploaded as objectname=employee_stats.txt
key = "employee_stats.txt"

#Check if bucket exists and create it.
try:
  s3c.head_bucket(Bucket=bucket)
  print "Bucket already exists : %s"%(bucket)
except Exception as err:
  print "Creating bucket %s"%(bucket)
  s3c.create_bucket(Bucket=bucket)

#Create file handle and upload the file to Objects endpoint.
print "Uploading file %s, as object %s in bucket %s"%(filename, key, bucket)
with open(filename, "r") as fh:
  s3c.put_object(Bucket=bucket, Key=key, Body=fh)

#Verify if file is uploaded
print "Checking if %s exists."%(key)
print "Head Object Response : %s"%(s3c.head_object(Bucket=bucket, Key=key))

You can find above code on github.

Create_bucket Or put_object gives tons or other options, and I will cover them in separate series.  In this blog, let's understand what is happening here.

In above code snippet, we are creating testbucket and uploading /tmp/employee_stats.txt file as object_name=employee_stats.txt using object_put API. After successful upload, we are executing object_head API to validate if object exists or not.

After executing each API, boto returns response in json format on successful execution Or it throws exception. In each API call , you should get httpstatuscode in response, which is important to determine what error we hit Or it it was successful.


If you execute above code , you should see :

Bucket already exists : testbucket
Uploading file /tmp/employee_stats.txt, as object employee_stats.txt in bucket testbucket
Checking if employee_stats.txt exists.
Head Object Response : {u'AcceptRanges': 'bytes', u'ContentType': 'binary/octet-stream', 'ResponseMetadata': {'HTTPStatusCode': 200, 'RetryAttempts': 0, 'HostId': '', 'RequestId': '15E92EE1AEFB1C00', 'HTTPHeaders': {'content-length': '102400', 'accept-ranges': 'bytes', 'md5sum': '312d4047c97ba54c74ca21ac38c52ac6', 'vary': 'Origin', 'server': 'NutanixS3', 'last-modified': 'Sun, 12 Jan 2020 15:58:11 GMT', 'etag': '"312d4047c97ba54c74ca21ac38c52ac6"', 'x-amz-request-id': '15E92EE1AEFB1C00', 'date': 'Sun, 12 Jan 2020 15:58:11 GMT', 'content-type': 'binary/octet-stream'}}, u'LastModified': datetime.datetime(2020, 1, 12, 15, 58, 11, tzinfo=tzutc()), u'ContentLength': 102400, u'ETag': '"312d4047c97ba54c74ca21ac38c52ac6"', u'Metadata': {}}

 

head_object call returns some interesting information :

  • Httpstatuscode/ErrorCode - Error codes are very helpful to determine success Or cause of failure .
  • Number of retries. - How many times SDK retried internally.
  • Content-length : this is the size of the object.
  • ETag : This is an important field , this is the md5sum of the data you just uploaded. Objects by default make sure that data uploaded correctly and integrity is maintained. If it finds md5sum on the data is different than the client provided, it will fail the operation.
    • $md5sum /tmp/employee_stats.txt
      • 312d4047c97ba54c74ca21ac38c52ac6  /tmp/employee_stats.txt
    • If you notice, md5sum of file is exactly the same as ETag returned by head_object or put_object response.

 

We will go over other fields in other blog.
Above code is very easy and straight forward.All it does is read the file and upload it, just a couple of API calls and work is done.

Cons :

  • But for any reason if upload fails midway (e.g - network glitch), client will have to re-upload the entire object. E.g - if you are uploading 4GB file and some network issue happens after uploading 3.5GB data, you will have to initiate entire upload again i.e entire 4GB data will have to send over the wire again.
  • We are uploading /tmp/employee_stats.txt in the single PUT operation, which means, the client is creating just one connection to S3 endpoint and uploads data. So whatever bandwidth you get in just one connection will be max, this may not be further optimized. Client can not create multiple connections to upload this file.

Next Read : Multipart Object upload APIs