The Atlas REST API – working examples

Originally I was writing a blogpost about my experiences with Apache Atlas (which is still in the works) in which I would refer to a Hortonworks Community post I wrote with all the working examples of Atlas REST API calls. But since Hortonworks Community has migrated to Cloudera Community, this article seems to have been lost. The original URL brings you to the Cloudera Community, but not the article. The search engine comes up with nothing. I can’t find it via my profile either.

It wasn’t particularly easy to gain all this knowledge. So of course I had a backup of all successful commands and output. And here it is. This was all tested on HDP 2.6.5.

Getting general information

Showing available tags:

curl -i -X GET http://sandbox.hortonworks.com:21000/api/atlas/types?type=TRAIT -u holger_gov:holger_gov

Result:

{"results":["Order management","Standard","Confidential","Internal","PII","Public","Important","Application_Y","Application_X","Critical"],"count":10,"requestId":"pool-2-thread-6 - e30cfa60-f67c-49a6-af39-9fb132fe5820"}

Show information on one tag, called Confidential:

curl -i -X GET -u holger_gov:holger_gov 'http://sandbox.hortonworks.com:21000/api/atlas/types/Confidential'

Result:

{"typeName":"Confidential","definition":{"enumTypes":[],"structTypes":[],"traitTypes":[{"superTypes":[],"hierarchicalMetaTypeName":"org.apache.atlas.typesystem.types.TraitType","typeName":"Confidential","typeDescription":"Informatie classificatie: Confidential","typeVersion":"1.0","attributeDefinitions":[]}],"classTypes":[]},"requestId":"pool-2-thread-5 - ba0de3ea-f040-4345-8cbf-73f21a9f721b"}

 

Getting info on data sources

Let’s say I have a Hive table that needs to be tagged. First I need to have its guid. You won’t find this guid in the Atlas GUI, and I don’t think you can do much stuff without it.

Showing all the guid’s:

curl -iv -u holger_gov:holger_gov -X GET http://sandbox.hortonworks.com:21000/api/atlas/entities?type=hive_table

Result:

{ "requestId": "pool-2-thread-4 - d7ab6dc3-2464-482d-b78b-24f5e14bfd49", "typeName": "hive_table", "results": [ "faedff26-819c-47ee-9cdf-77b4ba8bc547", "dfbe373d-b672-418e-8f45-5e285b64dd7d", "f8861c18-6ba2-455d-9024-96abf01387f1", "9503d0e1-37d7-4456-8b79-30dda3199f67", "c6fc7997-648a-4154-9bca-b265f4882ae5", "155265ec-dbfe-4951-b051-2b09c47a6c7f", "908cba22-cfa3-429e-a116-51dcdcbfb003", "8de87c74-3c5d-4b5c-8218-316f2b785f5d", "c87f4c84-54e7-4c07-ac77-f32811b40fa5", "aaabeeb3-7528-4f76-8f65-443d26bb266e", "59234a6b-7d3f-415f-b430-aae2074a0bb0", "83cd29e8-cca9-4595-9d3b-8479cd2fc3ee", "70f945be-7c3a-4580-86ac-695b345d0d7f" ], "count": 13}

That was actually not that useful, unless you know exactly which guid belongs to what Hive table. I am going to guess, you don’t.

 

Getting details on all Hive tables

curl -iv -u holger_gov:holger_gov -X GET http://sandbox.hortonworks.com:21000/api/atlas/v2/search/dsl?typeName=hive_table

Result:

{"queryType":"DSL","queryText":"`hive_table` ","entities":[{"typeName":"hive_table","attributes":{"owner":"spark","qualifiedName":"asteroids.asteroids_raw@Sandbox","name":"asteroids_raw","description":null},"guid":"faedff26-819c-47ee-9cdf-77b4ba8bc547","status":"ACTIVE","displayText":"asteroids_raw","classificationNames":["Applicatie_X"]},{"typeName":"hive_table","attributes":{"owner":"spark","qualifiedName":"asteroids.asteroids@Sandbox","name":"asteroids","description":null},"guid":"dfbe373d-b672-418e-8f45-5e285b64dd7d","status":"ACTIVE","displayText":"asteroids","classificationNames":[]},{"typeName":"hive_table","attributes":{"owner":"hive","qualifiedName":"xademo.call_detail_records@Sandbox","name":"call_detail_records","description":null},"guid":"70f945be-7c3a-4580-86ac-695b345d0d7f","status":"ACTIVE","displayText":"call_detail_records","classificationNames":[]},{"typeName":"hive_table","attributes":{"owner":"hive","qualifiedName":"default.sample* Connection #0 to host sandbox.hortonworks.com left intact* Closing connection #0_08@Sandbox","name":"sample_08","description":null},"guid":"c87f4c84-54e7-4c07-ac77-f32811b40fa5","status":"ACTIVE","displayText":"sample_08","classificationNames":[]},{"typeName":"hive_table","attributes":{"owner":"raj_ops","qualifiedName":"default.employee@Sandbox","name":"employee","description":null},"guid":"8de87c74-3c5d-4b5c-8218-316f2b785f5d","status":"ACTIVE","displayText":"employee","classificationNames":[]}]}

I’ve left out some results to make this more readable.

 

Often, you want to get details on only one Hive table. I had help on this from user @Aditya Sirna on Hortonworks Community for this (https://community.cloudera.com/t5/Support-Questions/Atlas-REST-API-search-of-a-table-fails/m-p/228023?childToView=153983#answer-153983).

So let’s say I want more info on a Hive table called asteroids (I like space themed data).

curl -X GET \
'http://sandbox.hortonworks.com:21000/api/atlas/v2/search/dsl?typeName=hive_table&query=where%20name%3D%22asteroids%22' \
-u holger_gov:holger_gov

Result:

{ "queryType": "DSL", "queryText": "`hive_table` where name=\"asteroids\"", "entities": [ { "typeName": "hive_table", "attributes": { "owner": "spark", "qualifiedName": "asteroids.asteroids@Sandbox", "name": "asteroids", "description": null }, "guid": "dfbe373d-b672-418e-8f45-5e285b64dd7d", "status": "ACTIVE", "displayText": "asteroids", "classificationNames": [] } ]}

All right, we got ourselves a guid.

 

Adding an entity to Atlas

Hive tables are searchable from the start in Atlas, but what about files/paths in HDFS? For this we need to create a hdfs_path entity.

Let’s say I have this file: /user/dmaster/electionresults/ls2014.tsv . I can add it to Atlas with this curl command:

curl -u holger_gov:holger_gov -ik -H "Content-Type: application/json" -X POST -d '{"entity": {"typeName" : "hdfs_path", "attributes" : {"name" : "electionresults", "qualifiedName" : "electionresults.electionresults@Sandbox", "path" : "/user/dmaster/electionresults/ls2014.tsv", "clusterName":"Sandbox"}}}' http://sandbox.hortonworks.com:21000/api/atlas/v2/entity

Result:

{"mutatedEntities":{"CREATE":[{"typeName":"hdfs_path","attributes":{"qualifiedName":"electionresults.electionresults@Sandbox"},"guid":"c7be38be-213d-400c-997e-3ea944d2109a","status":"ACTIVE"}]},"guidAssignments":{"-59913906192937":"c7be38be-213d-400c-997e-3ea944d2109a"}}

Also possible would have been to write the part from the first bracket ({) to the last one as a JSON file and use that. I haven’t tested this (danger!), but that probably would look like this:

curl -X POST -d @atlas_create_entity_election.json -u holger_gov:holger_gov -H 'Content-Type: application/json; charset=UTF-8' http://sandbox.hortonworks.com:21000/api/atlas/v2/entity

 

Deleting an entity

Sometimes you might need to delete an entity.

Okay, before we go any further, a word of warning: don’t delete any Hive items in Atlas, unless you are really certain you don’t want them back. Creating an item for a Hive table in Atlas afterwards, that is doable. But recreating the Hive lineage in Atlas, including columns, Hive processes and everything with a couple of Atlas REST API calls? Good luck!

Anyway, I needed to delete one HDFS entity because I was testing different commands. And for this you need its guid.

curl -iv -u holger_gov:holger_gov -X DELETE http://sandbox.hortonworks.com:21000/api/atlas/entities?guid=c7be38be-213d-400c-997e-3ea944d2109a

Result:

{"requestId":"pool-2-thread-9 - 49560d87-0ae0-4659-987f-3bf6fc92961e","entities":{"deleted":["c7be38be-213d-400c-997e-3ea944d2109a"]}}

 

Adding a tag to an entity

Let’s add a tag called Confidential to this entity. I’ve defined the Confidential tag with two attributes. One is called retention_required (boolean) and one is called max_retention_time_months (integer). This could be handy because of the European GDPR law that says that you need to set a retention time to personal data. Atlas doesn’t warn you when that time is passed though! You might want to write some code for that.

This time I’ve created this JSON file:

{ "classification":{ "typeName":"Confidential","attributes":{ "retention_required":"true","max_retention_time_months":"12"}},"entityGuids":[ "31987342-36e2-40ec-98dc-2a161c9e3ca4"]}

You can see I’ve added only one guid, but I could have added multiple.

With this command I classify the electionresult HDFS path:

curl -X POST -d @atlas_classify_election.json -u holger_gov:holger_gov -H 'Content-Type: application/json; charset=UTF-8' http://sandbox.hortonworks.com:21000/api/atlas/v2/entity/bulk/classification

One extra tip: some commands you can run multple times, but you can’t classify an entity twice.

 

Deleting a classification

If you use tag-based security in Ranger, you might want to think about who should be able to run this and who shouldn’t. Anyway, this is how you delete a classification on an entity:

curl -iv -u holger_gov:holger_gov -X DELETE http://sandbox.hortonworks.com:21000/api/atlas/v2/entity/guid/31987342-36e2-40ec-98dc-2a161c9e3ca4/classification/Confidential

 

Just ACTIVE entities, thank you

There was one more thing. I wanted to get guid’s of just ACTIVE entities, because otherwise you also will get all deleted entities. And I wanted visible in the output what that guid was. It took me a lot of effort to get the output exactly right. I’ve seen many “error”:”Invalid expression” and Server error messages, before I arrived here. But here it is.

curl -u holger_gov:holger_gov -ik -H "Content-Type: application/json" -X GET 'http://sandbox.hortonworks.com:21000/api/atlas/discovery/search/dsl?query=hdfs_path+where+__state=%27ACTIVE%27+select+qualifiedName,name,__guid'

{
"requestId": "pool-2-thread-4 - e20ef5ea-a994-4928-bf7c-b131bede3150",
"query": "hdfs_path where __state='ACTIVE' select qualifiedName,name,__guid",
"queryType": "dsl",
"count": 3,
"results": [
{
"$typeName$": "__tempQueryResultStruct2",
"qualifiedName": "electionresults.electionresults@Sandbox",
"__guid": "31987342-36e2-40ec-98dc-2a161c9e3ca4",
"name": "/user/dmaster/electionresults/ls2014.tsv"
},
{
"$typeName$": "__tempQueryResultStruct2",
"qualifiedName": "hdfs://sandbox.hortonworks.com:8020/user/dmaster/retail_db/orders",
"__guid": "5c818e10-8c1f-4298-bf54-094b37fb9e22",
"name": "/user/dmaster/retail_db/orders"
},
{
"$typeName$": "__tempQueryResultStruct2",
"qualifiedName": "hr.hr@Sandbox",
"__guid": "701f4234-da2b-4efe-900f-2e220c07e61a",
"name": "hr"
},
],
"dataType": {
"typeName": "__tempQueryResultStruct2",
"typeDescription": null,
"typeVersion": "1.0",
"attributeDefinitions": [
{
"name": "qualifiedName",
"dataTypeName": "string",
"multiplicity": {
"lower": 0,
"upper": 1,
"isUnique": false
},
"isComposite": false,
"isUnique": false,
"isIndexable": false,
"reverseAttributeName": null
},
{
"name": "name",
"dataTypeName": "string",
"multiplicity": {
"lower": 0,
"upper": 1,
"isUnique": false
},
"isComposite": false,
"isUnique": false,
"isIndexable": false,
"reverseAttributeName": null
},
{
"name": "__guid",
"dataTypeName": "string",
"multiplicity": {
"lower": 0,
"upper": 1,
"isUnique": false
},
"isComposite": false,
"isUnique": false,
"isIndexable": false,
"reverseAttributeName": null
}
]
}
}

I’m really happy with this result.

 

Creating HDFS path entities in bulk

When we were to start using Atlas I wanted to be able to quickly add HDFS path entities with a prepared JSON with all the entity definitions in it. So I got that working.

This is a JSON file with entity definitions for three HDFS paths:

{
"entities": [
{
"typeName": "hdfs_path",
"attributes": {
"path": "/user/dmaster/electionresults",
"qualifiedName": "hdfs://sandbox.hortonworks.com:8020/user/dmaster/electionresults",
"name": "/user/dmaster/electionresults"
},
"classification": [],
"status": "ACTIVE"
},
{
"typeName": "hdfs_path",
"attributes": {
"path": "/user/dmaster/nyse",
"qualifiedName": "hdfs://sandbox.hortonworks.com:8020/user/dmaster/nyse",
"name": "/user/dmaster/nyse"
},
"classification": [],
"status": "ACTIVE"
},
{
"typeName": "hdfs_path",
"attributes": {
"path": "/user/dmaster/lca",
"qualifiedName": "hdfs://sandbox.hortonworks.com:8020/user/dmaster/lca",
"name": "/user/dmaster/lca"
},
"classification": [],
"status": "ACTIVE"
}
]
}

And here is the command to create them:

curl -X POST -d @atlas_create_entities_bulk.json -u holger_gov:holger_gov -H 'Content-Type: application/json; charset=UTF-8' http://sandbox.hortonworks.com:21000/api/atlas/v2/entity/bulk

It does result in an error though and I can’t find any information about that:

{"errorCode":"ATLAS-500-00-007","errorMessage":"Failed to notify for change CREATE"}

 

If you do find this post useful, please leave a comment. It would cheer me up after putting so much hard work in getting this product to work.

About Marcel-Jan Krijgsman

In 2017 I made the leap to Big Data after 20 years of experience with Oracle databases. I followed courses on Hadoop, Big Data Analytics, Machine Learning and Python, MongoDB and Elasticsearch.
This entry was posted in Apache Atlas and tagged , , , , , , , , , . Bookmark the permalink.

2 Responses to The Atlas REST API – working examples

  1. Alex Zybert says:

    Hi,

    Thanks for the tuto.
    I’m a student in alternation in a big bank and I have to learn to use REST API for ATLAS, and you are helping me a lot!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.