Scala Apps Template
Template of directories
This is a particular structure of files and folders with which to compile Scala applications.
Spark Submit
To launch scala applications in Spark Cluster you could use spark-submit utility which is part of Spark tools. This is the alternative to spark-shell that is better for interactive mode.
The spark-submit executable an entry point class with main method. You can submit with option --class
. But if this app has more dependencies or need packaged apps, normally in jar format for Java and Scala.
Compilation
A compiler like sbt is necessary to build jars which could be submitted. And this compilation require some settings and a particular folder structure for our applications.
Folder tree:
|--lib
| |... (i.e.: dependency jars)
|--project (automatically generated by sbt)
|--src
| |--main
| | |--scala
| |... (i.e: .scala files)
|--target (compilation results)
|--built.sbt (sbt configuration)
- At root folder we have built.sbt with configuration stuff about sbt project compilation.
- Below src->main->scala we could deploy Scala Scripts which are part of the project
- lib folder is used to place dependencies; normally unmanaged jars (managed dependencies can be declared into built.sbt)
- built.sbt: you can place your app configuration here. For instance:
name := <app_name>
version := <app_version>
scalaVersion := <scala_app_version>
// spark version which fit with the app
val sparkVersion = <spark_version>
// external dependencies i.e.: Spark-Cassandra Connector
unmanagedJars in Compile += file("lib/spark-cassandra-connector.jar")
resolvers ++= Seq(
"apache-snapshots" at "http://repository.apache.org/snapshots/"
)
// managed dependencies (ivy, maven, etc.)
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-mllib" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-hive" % sparkVersion
)