GAMsetup                package:mgcv                R Documentation

_S_e_t _u_p _G_A_M _u_s_i_n_g _p_e_n_a_l_i_z_e_d _r_e_g_r_e_s_s_i_o_n _s_p_l_i_n_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Sets up design matrix X, penalty matrices S_i and linear equality
     constraint matrix C for a GAM defined in terms of  penalized
     regression splines. Various other information characterising the
     bases used is also returned. The output is such that the model can
     be fitted and  smoothing parameters estimated by the method of
     Wood (2000) as implemented in routine `mgcv()'. This is usually
     called by `gam'.

_U_s_a_g_e:

     GAMsetup(G)

_A_r_g_u_m_e_n_t_s:

       G: is the single argument to this function: it is a list
          containing at least the elements listed below:

       m: the number of smooth terms in the model

      df: an array of `G$m' integers specifying the maximum d.f. for
          each spline  term.

       n: the number of data to be modelled

    nsdf: the number of user supplied columns of the design matrix for
          any parametric  model parts

     dim: An array of dimensions for the smooths. `dim[i]' is the
          number of covariates that smooth `i' is a function of.

     fix: An array of logicals indicating whether each smooth term has
          fixed degrees of freedom or not.

  s.type: An array giving the type of basis used for each term. 0 for
          cubic regression spline, 1 for t.p.r.s 

 p.order: An array giving the order of the penalty for each term. 0 for
          auto selection.

       x: an array of `G$n' element arrays of data and (optionally)
          design matrix  columns. The first `G$nsdf' elements of `G$x'
          should contain the elements of  the columns of the design
          matrix corresponding to the parametric part of the model. The
           remaining `G$m' elements of `G$x' are the values of the
          covariates that are  arguments of the spline terms. Note that
          the smooths will be centred and no intercept term  will be
          added unless an array of 1's is supplied as part of in `G$x'

  vnames: Array of variable names, including the constant, if present.

       w: prior weights on response data.

      by: a 2-d array of `by' variables (i.e. covariates that multiply
          a smooth term) `by[i,j]' is the jth value for the ith `by'
          variable. There are only as many rows of this array as there
          are `by' variables in the model (often 0). The rownames of
          `by' give the `by' variable names.

by.exists: an array of logicals: `by.exists[i]' is `TRUE' if the ith
          smooth has a `by' variable associated with it, `FALSE'
          otherwise.

   knots: a compact array of user supplied knot locations for each
          smooth, in the order corresponding  to the  row order in
          `G$x'. There are `G$dim[i]' arrays of length `G$n.knots[i]'
          for the ith smooth - all these arrays are packed end to end
          in 1-d array `G$knots' -  zero length 1 for no knots.

 n.knots: array giving number of user supplied knots of basis for each
          smooth term 0's for none supplied.

_V_a_l_u_e:

     A list `H', containing the elements of `G' (the input list) plus
     the  following:   

       X: the full design matrix.

       S: A one dimensional array containing the non-zero elements of
          the penalty matrices. Let
          `start[k+1]<-start[k]+H$df[1:(k-1)]^2' and `start[1]<-0'.
          Then penalty matrix `k' has `H$S[start[k]+i+H$df[i]*(j-1)' on
          its ith row and jth column. To get the kth full penalty
          matrix the matrix so obtained would be inserted into a full
          matrix of zeroes with it's 1,1 element at
          `H$off[k],H$off[k]'.  

     off: is an array of offsets, used to facilitate efficient storage
          of the penalty  matrices and to indicate where in the overall
          parameter vector the parameters of the ith  spline reside
          (e.g. first parameter of ith spline is at `p[off[i]+1]').

       C: a matrix defining the linear equality constraints on the
          parameters used to define the the model (i.e. C in Cp=0). 

      UZ: Array containing matrices, which transform from a t.p.r.s.
          basis to the equivalent t.p.s. basis (for t.p.r.s. terms
          only). The packing method is as follows: 
          set `start[1]<-0' and
          `start[k+1]<-start[k]+(M[k]+n)*tp.bs[k]' where `n' is number
          of data, `M[k]' is penalty null space dimension and
          `tp.bs[k]' is zero for a cubic regression spline and the
          basis dimension for a t.p.r.s. Then element `i,j' of the UZ
          matrix for model term `k' is:
          `UZ[start[k]+i+(j=1)*(M[k]+n)]'.

      Xu: Set of unique covariate combinations for each term.  The
          packing method is as follows: 
          set `start[1]<-0' and
          `start[k+1]<-start[k]+(xu.length[k])*tp.dim[k]' where
          `xu.length[k]' is number of unique covariate combinations and
          `tp.dim[k]' is zero for a cubic regression spline and the
          dimension of the smooth (i.e. number of covariates it is a
          function of) for a t.p.r.s. Then element `i,j' of the Xu
          matrix for model term `k' is:
          `Xu[start[k]+i+(j=1)*(xu.length[k])]'.

xu.length: Number of unique covariate combinations for each t.p.r.s.
          term.

covariate.shift: All covariates are centred around zero before bases
          are constructed - this is an array of the applied shifts.

      xp: matrix whose rows contain the covariate values corresponding
          to the  parameters  of each cubic regression spline - the
          cubic regression splines are parameterized using their y- 
          values at a series of x values - these vectors contain those
          x  values! Note that these will be covariate shifted.

_A_u_t_h_o_r(_s):

     Simon N. Wood snw@st-and.ac.uk

_R_e_f_e_r_e_n_c_e_s:

     Wood, S.N. (2000) "Modelling and smoothing parameter estimation
     with multiple quadratic penalties" JRSSB 62(2):413-428

_S_e_e _A_l_s_o:

     `mgcv' `gam'

_E_x_a_m_p_l_e_s:

     # This example modified from routine SANtest()
     set.seed(0)
     n<-100 # number of observations to simulate
     x <- runif(5 * n, 0, 1) # simulate covariates
     x <- array(x, dim = c(5, n)) # put into array for passing to GAMsetup
     pi <- asin(1) * 2  # begin simulating some data
     y <- 2 * sin(pi * x[2, ])
     y <- y + exp(2 * x[3, ]) - 3.75887
     y <- y + 0.2 * x[4, ]^11 * (10 * (1 - x[4, ]))^6 + 10 * (10 * 
          x[4, ])^3 * (1 - x[4, ])^10 - 1.396
     sig2<- -1    # set magnitude of variance 
     e <- rnorm(n, 0, sqrt(abs(sig2)))
     y <- y + e          # simulated data
     w <- matrix(1, n, 1) # weight matrix
     par(mfrow = c(2, 2)) # scatter plots of simulated data
     plot(x[2, ], y);plot(x[3, ], y);plot(x[4, ], y);plot(x[5, ], y)
     x[1,]<-1
     # create list for passing to GAMsetup....
     G <- list(m = 4, n = n, nsdf = 0, df = c(15, 15, 15, 15),dim=c(1,1,1,1),
          s.type=c(0,0,0,0),by=0,by.exists=c(FALSE,FALSE,FALSE,FALSE),
          p.order=c(0,0,0,0),x = x,n.knots=rep(0,4))
     H <- GAMsetup(G)
     H$y <- y    # add data to H
     H$sig2 <- sig2  # add variance (signalling GCV use in this case) to H
     H$w <- w       # add weights to H
     H$sp<-array(-1,H$m)
     H$fix<-array(FALSE,H$m)
     H$conv.tol<-1e-6;H$max.half<-15
     H$min.edf<-5
     H <- mgcv(H)  # select smoothing parameters and fit model    

