Title: | Read in 'SAS' Data ('.sas7bdat' Files) into 'Apache Spark' |
---|---|
Description: | Read in 'SAS' Data ('.sas7bdat' Files) into 'Apache Spark' from R. 'Apache Spark' is an open source cluster computing framework available at <http://spark.apache.org>. This R package uses the 'spark-sas7bdat' 'Spark' package (<https://spark-packages.org/package/saurfang/spark-sas7bdat>) to import and process 'SAS' data in parallel using 'Spark'. Hereby allowing to execute 'dplyr' statements in parallel on top of 'SAS' data. |
Authors: | Jan Wijffels [aut, cre, cph], BNOSAC [cph], Geyer Bisschoff [ctb] |
Maintainer: | Jan Wijffels <[email protected]> |
License: | GPL-3 |
Version: | 1.4 |
Built: | 2024-10-30 02:44:24 UTC |
Source: | https://github.com/bnosac/spark.sas7bdat |
'spark.sas7bdat' uses the spark-sas7bdat Spark package to process SAS datasets in parallel using Spark. Hereby allowing to execute dplyr statements on top of SAS datasets.
Read in SAS datasets in .sas7bdat format into Spark by using the spark-sas7bdat Spark package.
spark_read_sas(sc, path, table)
spark_read_sas(sc, path, table)
sc |
Connection to Spark local instance or remote cluster. See the example |
path |
full path to the SAS file either on HDFS (hdfs://), S3 (s3n://), as well as the local file system (file://). Mark that files on the local file system need to be specified using the full path. |
table |
character string with the name of the Spark table where the SAS dataset will be put into |
an object of class tbl_spark
, which is a reference to a Spark DataFrame based on which
dplyr functions can be executed. See https://github.com/sparklyr/sparklyr
https://spark-packages.org/package/saurfang/spark-sas7bdat, https://github.com/saurfang/spark-sas7bdat, https://github.com/sparklyr/sparklyr
## Not run: ## If you haven't got a Spark cluster, you can install Spark locally like this library(sparklyr) spark_install(version = "2.0.1") ## Define the SAS .sas7bdat file, connect to the Spark cluster to read + process the data myfile <- system.file("extdata", "iris.sas7bdat", package = "spark.sas7bdat") myfile library(spark.sas7bdat) sc <- spark_connect(master = "local") x <- spark_read_sas(sc, path = myfile, table = "sas_example") x library(dplyr) x %>% group_by(Species) %>% summarise(count = n(), length = mean(Sepal_Length), width = mean(Sepal_Width)) ## End(Not run)
## Not run: ## If you haven't got a Spark cluster, you can install Spark locally like this library(sparklyr) spark_install(version = "2.0.1") ## Define the SAS .sas7bdat file, connect to the Spark cluster to read + process the data myfile <- system.file("extdata", "iris.sas7bdat", package = "spark.sas7bdat") myfile library(spark.sas7bdat) sc <- spark_connect(master = "local") x <- spark_read_sas(sc, path = myfile, table = "sas_example") x library(dplyr) x %>% group_by(Species) %>% summarise(count = n(), length = mean(Sepal_Length), width = mean(Sepal_Width)) ## End(Not run)