--- title: "Data Structures" output: html_notebook --- ## Strings ```{r} myString <- "Hello" myString <- paste(myString, " World!") print(myString) substring(myString, 2, 4) # substring(string, startIndex, endIndex) nchar(myString) # number of characters length(myString) # 1 tolower(myString) toupper(myString) ``` **format(x, digits, nsmall, scientific, width, justify = c("left", "right", "centre", "none"))** ## Vectors Vectors contain elements of the same type. ```{r} # generate v <- 3:7 print(v) w <- seq(3, 7, by = 0.5) print(w) x <- c(3, 5, 7) print(x) y <- c("11", "12", "13") print(y) vw = c(v, w) print(vw) # access z <- w[1] print(z) z <- w[c(1,3,5)] print(z) z <- w[c(-1,-3,-5)] print(z) ``` ```{r} # arithmetic a <- c(1, 2, 3) b <- c(4, 5, 6) a-1 a+1 a*2 a/2 a+b b-a a*b b/a a%*%b # inner product (scalar product) a%o%b # outer product ``` ## Lists Lists can contain elements of different types. A list can also contain a matrix or a function as its elements. ## Matrices Matrices are the R objects in which the elements are arranged in a two-dimensional rectangular layout. They contain elements of the same atomic types. (Used for mathematical calculations) **matrix(data, nrow, ncol, byrow, dimnames)** * data is the input vector which becomes the data elements of the matrix. * nrow is the number of rows to be created. * ncol is the number of columns to be created. * byrow is a logical clue. If TRUE then the input vector elements are arranged (filled) by row. * dimname is the names assigned to the rows and columns. ```{r} M <- matrix(c(1:6), ncol = 3, byrow = TRUE) print(M) M <- matrix(c(1:6), ncol = 3) print(M) M[2,2] M[2,] M[,2] ``` +,-,*,/ operate element-wise %*% Matrix Multiplication (as for vectors) ## Arrays Arrays are the R data objects which can store data in more than two dimensions. For example − If we create an array of dimension (2, 3, 4) then it creates 4 rectangular matrices each with 2 rows and 3 columns. Arrays can store only data type. **array(data, dim, dimnames)** Indexing as with matrices/vectors. Calculations across array elements: **apply(array, margin, fun)** * array is an array. * margin is the name of the data set used. (indexes, dimnames) * fun is the function to be applied across the elements of the array. ## Factors Factors are the data objects which are used to categorize the data and store it as levels. They can store both strings and integers. They are useful in the columns which have a limited number of unique values. Like "Male, "Female" and True, False etc. They are useful in data analysis for statistical modeling. ## Dataframes A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column. ```{r} label <- c("a", "b", "c") value1 <- c(1, 2, 3) value2 <- c(2, 3, 4) df <- data.frame(label, value1, value2) print(df) df$label # access with $,[ or [[ df$label <- c("d", "e", "f") print(df) ``` ```{r} sub <- subset(df, value1 > 1) print(sub) sub <- subset(df, value1 > 1, select=c(label, value2)) print(sub) ``` #### Adjust headers ```{r} bod <- read.table("BOD.txt", header=F) print(bod) colnames(bod) <- c("Time","demand") colnames(bod) bod ``` [R for MATLAB users](http://mathesaurus.sourceforge.net/octave-r.html)