Nutanix Objects Versioning

Anirudha | Wed, 03/18/2020 - 05:35

In Writing and Reading data blog, we looked at how to store/access data on Nutanix Objects Cluster .

In this blog we will explore :

  1. Data over-write behaviour.
  2. Bucket versioning:
    1. From Objects UI.
    2. From S3 client.
    3. From S3 API
  3. Reading versioned data
  4. Delete Markers
  5. Deleting versioned data

What you need :

  • Objects Cluster
  • Valid IAM credentials.
  • Access to Objects UI.

 

Default data overwrite behaviour :

Any data you upload to the Objects cluster becomes immutable in nature, which means once data is written, there is absolutely no way you can edit that data. You can upload the entire data again but editing existing data is not possible. Any incremental changes to the object will overwrite an object.

 

Let’s take a example what happens if you try to upload two files under same objectname: 

1) movie_list.txt  is 700mb in size

2) movie_list_latest.txt  is 1M in size.

If I upload movie_list.txt as objectname=allmovienames to Objects cluster, size of allmovienames will be 700mb. 

And if I get some more incremental data and I upload delta change (i.e movie_list_latest.txt) on the same object name (i.e allmovienames) again, then the size of allmovienames will be now 1M . 

This upload will overwrite previously written data and you will lose all 700mb data, which becomes eligible for garbage collection and will be cleaned by Objects ATLAS service as a part of partial/full scan. There is no way you can revert this mistake or action.

 

Lets goto python prompt and take a quick look at this :

>>> bucket="demobucket009"

>>> s3c.create_bucket(Bucket=bucket)

{u'Location': '/demobucket009', 'ResponseMetadata': {'HTTPStatusCode': 200, 'RetryAttempts': 0, 'HostId': '', 'RequestId': '15FCBA31BCF72E64', 'HTTPHeaders': {'content-length': '0', 'accept-ranges': 'bytes', 'vary': 'Origin', 'server': 'NutanixS3', 'x-amz-request-id': '15FCBA31BCF72E64', 'location': '/demobucket009', 'date': 'Mon, 16 Mar 2020 08:04:51 GMT'}}}

>>> objname="allmovienames"

>>> s3c.put_object(Bucket=bucket, Key=objname, Body="TOP.Movies.2019")

{u'ETag': '"05c8dc01c9dbb5ae3c8747cc48c02674"', 'ResponseMetadata': {'HTTPStatusCode': 200, 'RetryAttempts': 0, 'HostId': '', 'RequestId': '15FCBA40210577AF', 'HTTPHeaders': {'content-length': '0', 'accept-ranges': 'bytes', 'vary': 'Origin', 'server': 'NutanixS3', 'etag': '"05c8dc01c9dbb5ae3c8747cc48c02674"', 'x-amz-request-id': '15FCBA40210577AF', 'date': 'Mon, 16 Mar 2020 08:05:53 GMT'}}}

In the above code, I created a bucket and uploaded object=allmovienames . For the sake of example I uploaded just a few text chars in the object so I could demo this easily.

Here content of allmovienames = "TOP.Movies.2019" . If you read the object back , you should see the same data :

>>> s3c.get_object(Bucket=bucket,Key=objname)["Body"].read()

'TOP.Movies.2019'

>>>

Now lets try to upload new text on the same objectname.

>>> s3c.put_object(Bucket=bucket, Key=objname, Body="Comedy.Movies.2020")

{u'ETag': '"7349a8f49d103b603e09977826c03df9"', 'ResponseMetadata': {'HTTPStatusCode': 200, 'RetryAttempts': 0, 'HostId': '', 'RequestId': '15FCBA4DDDA64D43', 'HTTPHeaders': {'content-length': '0', 'accept-ranges': 'bytes', 'vary': 'Origin', 'server': 'NutanixS3', 'etag': '"7349a8f49d103b603e09977826c03df9"', 'x-amz-request-id': '15FCBA4DDDA64D43', 'date': 'Mon, 16 Mar 2020 08:06:52 GMT'}}}

>>> s3c.get_object(Bucket=bucket,Key=objname)["Body"].read()

'Comedy.Movies.2020'

>>>

This time the content of allmovienames = "Comedy.Movies.2020" , and if you read the object back we got the same data.

Previously written data (i.e "TOP.Movies.2019") is no more returned and not accessible anymore.

Now if you wish to preserve both these files under just one objectname, then that’s where the Versioning feature will help you.

With the help of versioning feature, you can upload different data on the same objectname and all the data will be saved under the same objectname but will be represented by a different object version.

Let’s take an example :

  • I upload movie_list.txt as objectname=allmovienames to Objects cluster, size of allmovienames will be 700mb. When I upload this data, Objects cluster will return me object-version-id (lets say : ver1) in the response which will represent this data.
  • I upload another 1M data as objectname=allmovienames . Size of allmovienames becomes 1MB now, and Objects cluster will return me unique object-version-id (lets say : ver2) in the response.
  • If I keep over-writing this object again and again, everytime I upload data I will get a unique object-version-id.
  • And to access any of the previous version of the object, the user will have to give the version-id of the object as well in the get_object API call.

Few things to keep in mind :

  • Versioning is bucket-level property.
  • You can enable versioning on the bucket .
  • Once it's enabled, you can not disable this feature, but you can suspend it whenever needed.
  • Once it's suspended then any overwrite will overwrite the latest version of the object, while all the old versions (created before suspending the versioning) will be preserved.
  • During cleanup of versioned objects, you will have to specify version-id as well with delete-object API.