Join [/join]

Join [POST]

Retrieve data from two Cassandra data sets and join them by key fields. It accepts two kind of parameters: simple table description that is an operand; or an operation, two choices also: recursive join description or a recursive union description, which both are composed of two parameters:

left operand: table description, join description or union description.
right operand: table description, join description or union description.

	Possible combinations
table_a	⨝	table_b
table_a	⨝	join_b
table_a	⨝	union_b
join_a	⨝	table_b
join_a	⨝	join_b
join_a	⨝	union_b
union_a	⨝	table_b
union_a	⨝	join_b
union_a	⨝	union_b

Thus, recursivelly we could define whatever table merging. Similar to SQL but only binary operations, same functionality.

These descriptions has a few parameters which identify the data source: keyspace and tablename. Besides that, two descriptions involved in a join operation needs join_key parameter that declare which fields are the key to join each other.

Parameters:

join_type, defines kind of Natural Join; in BNF notation:

<join_type> ::= "join_type": "inner" | "full" | "left" | "right" | "cross"

Optional parameter, "inner" is the default value.

Other optional parameters are aimed to apply DML functionality: groupby, select, calculated, etc.:

Select

Projection operation of relational algebra. It allows rename operation as well.
```
  "select": [<field_a>, <field_b>, {<field_c>: <field_c_renamed>}, ... ],
```
Calculated

Optional parameter that allows Spark SQL expressions which calculate new fields from others.
```
  "calculated": {
      <calculated_field>: <Spark SQL expression>
  },
```
Filter

Filter expression
```
  "filter": <filter_expression>,
```

Group by

Gets data from Cassandra table which is applied a groupby statement.

  "groupby": { 
      "grouped": [<field_a>],
      "agg": {
          "count": [<field_b>]
      }
  }

Sort by

  "sortby": [
      {<field_a>: "desc"},
      {<field_b>: "asc"},  //default
      ...
  ]

Save

It writes back results into Cassandra existing table. The result set must be compatible with Cassandra table structure but as flexible as Cassandra is.
```
  "save": {
      "keyspace": <destination keyspace>,
      "tablename": <destination table>
  }
```

Test Details Ordered list of countries with more than 10K billion Euros of drug annual budget. It summarizes Drug Companies annual budget grouped by country.

Shell snippet

curl --request POST \
    --url http://localhost:5000/join \
    --header 'content-type: application/json' \
    --data '{
        "join_a": {
            "table_a": {
                "keyspace": "examples",
                "tablename": "mock_data",
                "join_key": [{"id": "id_person"}],
                "calculated": {
                    "edad": "round(months_between(current_date(),birth_date)/12,0)"
                },
                "select": ["first_name","last_name","email","gender","drinker","smoker_bool","language","edad"]
            },
            "table_b": {
                "keyspace": "examples_bis",
                "tablename": "mock_drugs",
                "join_key": ["id_patient"],
                "select": ["drug_name","id_company"]
            },
            "join_key": ["id_company"]
        },
        "table_b": {
            "keyspace": "examples_bis",
            "tablename": "mock_companies",
            "join_key": ["id_company"],
            "select": ["annual_budget","city","company_name","country",{"id": "id_company"}]
        },
        "filter": "annual_budget > 10000000000",    
        "groupby": { 
            "grouped": ["country"],
            "agg": {
                "sum": ["annual_budget"]
            }
        },
        "sortby": [
            {"sum(annual_budget)": "desc"}
        ]   
    }'

// Response 200 OK
{
    "calculated": null,
    "data": {
        "country": {
            "0": "China",
            "1": "France",
            "2": "Poland",
            "3": "Portugal",
            "4": "Japan",
            "5": "Mexico",
            "6": "Sweden",
            "7": "United States",
            "8": "Canada",
            "9": "Ukraine",
            "10": "Georgia",
            "11": "Germany",
            "12": "Spain",
            "13": "Norway",
            "14": "Ireland",
            "15": "Netherlands"
        },
        "sum(annual_budget)": {
            "0": 3901666527232.0,
            "1": 924393641984.0,
            "2": 706987144192.0,
            "3": 606915214336.0,
            "4": 511156410368.0,
            "5": 507268958208.0,
            "6": 398938326016.0,
            "7": 339608810496.0,
            "8": 283575956480.0,
            "9": 238798228480.0,
            "10": 218540765184.0,
            "11": 166788163584.0,
            "12": 147850549248.0,
            "13": 123179765760.0,
            "14": 111000954880.0,
            "15": 100029864960.0
        }
    },
    "format": "dict",
    "join_a": {
        "join_key": [
            "id_company"
        ],
        "table_a": {
            "calculated": {
                "edad": "round(months_between(current_date(),birth_date)/12,0)"
            },
            "join_key": [
                {
                    "id": "id_person"
                }
            ],
            "keyspace": "examples",
            "select": [
                "first_name",
                "last_name",
                "email",
                "gender",
                "drinker",
                "smoker_bool",
                "language",
                "edad",
                {
                    "id": "id_person"
                }
            ],
            "tablename": "mock_data"
        },
        "table_b": {
            "join_key": [
                "id_patient"
            ],
            "keyspace": "examples_bis",
            "select": [
                "drug_name",
                "id_company",
                "id_patient"
            ],
            "tablename": "mock_drugs"
        }
    },
    "join_b": null,
    "join_groupby": {
        "agg": {
            "sum": [
                "annual_budget"
            ]
        },
        "grouped": [
            "country"
        ]
    },
    "join_key": [],
    "join_type": "inner",
    "orient_results": "columns",
    "s_filter": "annual_budget > 10000000000",
    "save": null,
    "select": null,
    "sortby": [
        {
            "sum(annual_budget)": "desc"
        }
    ],
    "status": 200,
    "success": true,
    "table_a": null,
    "table_b": {
        "join_key": [
            "id_company"
        ],
        "keyspace": "examples_bis",
        "select": [
            "annual_budget",
            "city",
            "company_name",
            "country",
            {
                "id": "id_company"
            }
        ],
        "tablename": "mock_companies"
    },
    "union_a": null,
    "union_b": null
}