Alluxio guide

This tutorial provides a brief introduction to using Alluxio.

  • How to use Alluxio in CarbonData?
    • [Running alluxio example in CarbonData project by IDEA](#Running alluxio example in CarbonData project by IDEA)
    • [CarbonData supports alluxio by spark-shell](#CarbonData supports alluxio by spark-shell)
    • [CarbonData supports alluxio by spark-submit](#CarbonData supports alluxio by spark-submit)

Running alluxio example in CarbonData project by IDEA

Building CarbonData

  • Please refer to Building CarbonData.
  • Users need to install IDEA and scala plugin, and import CarbonData project.

Installing and starting Alluxio

Running Example

CarbonData supports alluxio by spark-shell

Building CarbonData

Preparing Spark

Downloading alluxio and uncompressing it

Running spark-shell

  • Running the command in spark path
./bin/spark-shell --jars ${CARBONDATA_PATH}/assembly/target/scala-2.11/apache-carbondata-2.0.0-SNAPSHOT-bin-spark2.3.4-hadoop2.7.2.jar,${ALLUXIO_PATH}/client/alluxio-1.8.1-client.jar
  • Testing use alluxio by CarbonSession
import org.apache.spark.sql.CarbonSession._
import org.apache.spark.sql.SparkSession
val carbon = SparkSession.builder().master("local").appName("test").getOrCreateCarbonSession("alluxio://localhost:19998/carbondata");
carbon.sql("CREATE TABLE carbon_alluxio(id String,name String, city String,age Int) STORED as carbondata");
carbon.sql(s"LOAD DATA LOCAL INPATH '${CARBONDATA_PATH}/integration/spark/src/test/resources/sample.csv' into table carbon_alluxio");
carbon.sql("select * from carbon_alluxio").show
  • Result
scala> carbon.sql("select * from carbon_alluxio").show
| id|  name|     city|age|
|  1| david| shenzhen| 31|
|  2| eason| shenzhen| 27|
|  3| jarry|    wuhan| 35|
|  3| jarry|Bangalore| 35|
|  4| kunal|    Delhi| 26|
|  4|vishal|Bangalore| 29|

CarbonData supports alluxio by spark-submit

Building CarbonData

Preparing Spark

Downloading alluxio and uncompressing it

Running spark-submit

Upload data to alluxio

./bin/alluxio fs  copyFromLocal ${CARBONDATA_PATH}/hadoop/src/test/resources/data.csv /


./bin/spark-submit \
--master local \
--jars ${ALLUXIO_PATH}/client/alluxio-1.8.1-client.jar,${CARBONDATA_PATH}/examples/spark/target/carbondata-examples-2.0.0-SNAPSHOT.jar \
--class org.apache.carbondata.examples.AlluxioExample \
${CARBONDATA_PATH}/assembly/target/scala-2.11/apache-carbondata-2.0.0-SNAPSHOT-bin-spark2.3.4-hadoop2.7.2.jar \

NOTE: Please set runShell as false, which can avoid dependency on alluxio shell module.


|SegmentSequenceId| Status|     Load Start Time|       Load End Time|Merged To|File Format|Data Size|Index Size|
|                1|Success|2019-01-09 15:10:...|2019-01-09 15:10:...|       NA|COLUMNAR_V3|  23.92KB|    1.07KB|
|                0|Success|2019-01-09 15:10:...|2019-01-09 15:10:...|       NA|COLUMNAR_V3|  23.92KB|    1.07KB|

| france|   202|
|  china|  1698|

|SegmentSequenceId|   Status|     Load Start Time|       Load End Time|Merged To|File Format|Data Size|Index Size|
|                3|Compacted|2019-01-09 15:10:...|2019-01-09 15:10:...|      0.1|COLUMNAR_V3|  23.92KB|    1.03KB|
|                2|Compacted|2019-01-09 15:10:...|2019-01-09 15:10:...|      0.1|COLUMNAR_V3|  23.92KB|    1.07KB|
|                1|Compacted|2019-01-09 15:10:...|2019-01-09 15:10:...|      0.1|COLUMNAR_V3|  23.92KB|    1.07KB|
|              0.1|  Success|2019-01-09 15:10:...|2019-01-09 15:10:...|       NA|COLUMNAR_V3|  37.65KB|    1.08KB|
|                0|Compacted|2019-01-09 15:10:...|2019-01-09 15:10:...|      0.1|COLUMNAR_V3|  23.92KB|    1.07KB|


[1] [2]