Commit 04b05367 authored by Jana Ulrich's avatar Jana Ulrich
Browse files

Merge branch 'development' into 'covariates'

Development

See merge request Rpackages/IDF!6
parents e34403ab 1bd06d3b
...@@ -5,6 +5,7 @@ Version: 2.0.0 ...@@ -5,6 +5,7 @@ Version: 2.0.0
Date: 2020-11-22 Date: 2020-11-22
Authors@R: c(person("Jana", "Ulrich", email = "jana.ulrich@fu-berlin.de", role = c("aut", "cre")), Authors@R: c(person("Jana", "Ulrich", email = "jana.ulrich@fu-berlin.de", role = c("aut", "cre")),
person("Laura","Mack", email= "laura.mack@fu-berlin.de",role="ctb"), person("Laura","Mack", email= "laura.mack@fu-berlin.de",role="ctb"),
person("Oscar E.","Jurado", email= "jurado@zedat.fu-berlin.de",role="ctb"),
person("Christoph", "Ritschel", role = "aut"), person("Christoph", "Ritschel", role = "aut"),
person("Carola", "Detring", role = "ctb"), person("Carola", "Detring", role = "ctb"),
person("Sarah", "Joedicke", role = "ctb")) person("Sarah", "Joedicke", role = "ctb"))
...@@ -14,8 +15,8 @@ Description: Intensity-duration-frequency (IDF) curves are a widely used analysi ...@@ -14,8 +15,8 @@ Description: Intensity-duration-frequency (IDF) curves are a widely used analysi
The package 'IDF' provides functions to estimate IDF parameters for given The package 'IDF' provides functions to estimate IDF parameters for given
precipitation time series on the basis of a duration-dependent precipitation time series on the basis of a duration-dependent
generalised extreme value distribution generalised extreme value distribution
[Koutsoyannis et al., 1998, <doi:10.1016/S0022-1694(98)00097-3>]. [Koutsoyiannis et al., 1998, <doi:10.1016/S0022-1694(98)00097-3>].
Author: Jana Ulrich [aut, cre], Laura Mack [ctb], Christoph Ritschel [aut], Carola Detring [ctb], Sarah Joedicke [ctb] Author: Jana Ulrich [aut, cre], Laura Mack [ctb], Oscar E. Jurado [ctb], Christoph Ritschel [aut], Carola Detring [ctb], Sarah Joedicke [ctb]
Maintainer: Jana Ulrich <jana.ulrich@fu-berlin.de> Maintainer: Jana Ulrich <jana.ulrich@fu-berlin.de>
Imports: stats, Imports: stats,
evd, evd,
......
...@@ -27,6 +27,9 @@ importFrom(graphics,plot) ...@@ -27,6 +27,9 @@ importFrom(graphics,plot)
importFrom(graphics,points) importFrom(graphics,points)
importFrom(graphics,title) importFrom(graphics,title)
importFrom(ismev,gev.fit) importFrom(ismev,gev.fit)
importFrom(parallel,makeCluster)
importFrom(parallel,parLapply)
importFrom(parallel,stopCluster)
importFrom(pbapply,pblapply) importFrom(pbapply,pblapply)
importFrom(stats,lm) importFrom(stats,lm)
importFrom(stats,make.link) importFrom(stats,make.link)
......
# IDF 2.0.0
The package was extended to allow generalized linear modeling of the d-GEV parameters. This can be used to model for example spatial variations of the parameters.
The packet was extensively revised and restructured and some functions were removed as they were too specific.
# IDF 1.0.0
R package for maximum likelihood fitting of duration-dependent generalized extreme value distribution (d-GEV).
Additional functions forprocessing data (obtaining annual maxima) plotting IDF curves.
\ No newline at end of file
...@@ -91,7 +91,7 @@ NULL ...@@ -91,7 +91,7 @@ NULL
#' The data.frame must have the columns 'date' and 'RR' unless other names #' The data.frame must have the columns 'date' and 'RR' unless other names
#' are specified in the parameter `names`. The column 'date' must contain strings with #' are specified in the parameter `names`. The column 'date' must contain strings with
#' standard date format. #' standard date format.
#' @param ds numeric vector of aggregation durations. #' @param ds numeric vector of aggregation durations in hours.
#' (Must be multiples of time resolution at all stations.) #' (Must be multiples of time resolution at all stations.)
#' @param na.accept numeric in [0,1) giving maximum percentage of missing values #' @param na.accept numeric in [0,1) giving maximum percentage of missing values
#' for which block max. should still be calculated. #' for which block max. should still be calculated.
...@@ -99,7 +99,7 @@ NULL ...@@ -99,7 +99,7 @@ NULL
#' containing names of elements in data. If not given, all elements in `data` will be used. #' containing names of elements in data. If not given, all elements in `data` will be used.
#' @param which.mon optional, subset of months (as list containing values from 0 to 11) of which to calculate the annual maxima from. #' @param which.mon optional, subset of months (as list containing values from 0 to 11) of which to calculate the annual maxima from.
#' @param names optional, character vector of length 2, containing the names of the columns to be used. #' @param names optional, character vector of length 2, containing the names of the columns to be used.
#' @param cl optional, number of cores to be used from \code{\link[pbapply]{pblapply}} for parallelization. #' @param cl optional, number of cores to be used from \code{\link[parallel]{parLapply}} for parallel computing.
#' #'
#' @details If data contains stations with different time resolutions that need to be aggregated at #' @details If data contains stations with different time resolutions that need to be aggregated at
#' different durations, IDF.agg needs to be run separately for the different groups of stations. #' different durations, IDF.agg needs to be run separately for the different groups of stations.
...@@ -112,6 +112,7 @@ NULL ...@@ -112,6 +112,7 @@ NULL
#' @seealso \code{\link{pgev.d}} #' @seealso \code{\link{pgev.d}}
#' #'
#' @export #' @export
#' @importFrom parallel parLapply makeCluster stopCluster
#' @importFrom pbapply pblapply #' @importFrom pbapply pblapply
#' @importFrom RcppRoll roll_sum #' @importFrom RcppRoll roll_sum
#' @importFrom fastmatch ctapply #' @importFrom fastmatch ctapply
...@@ -137,7 +138,7 @@ NULL ...@@ -137,7 +138,7 @@ NULL
#' IDF.agg(list('Sample'=df),ds=c(24,48),na.accept = 0.01,which.mon = list(5:7)) #' IDF.agg(list('Sample'=df),ds=c(24,48),na.accept = 0.01,which.mon = list(5:7))
#' #'
IDF.agg <- function(data,ds,na.accept = 0, IDF.agg <- function(data,ds,na.accept = 0,
which.stations = NULL,which.mon = list(0:11),names = c('date','RR'),cl = NULL){ which.stations = NULL,which.mon = list(0:11),names = c('date','RR'),cl = 1){
if(!inherits(data, "list"))stop("Argument 'data' must be a list, instead it is a: ", class(data)) if(!inherits(data, "list"))stop("Argument 'data' must be a list, instead it is a: ", class(data))
...@@ -151,13 +152,13 @@ NULL ...@@ -151,13 +152,13 @@ NULL
,' or $', names[2], '.')} ,' or $', names[2], '.')}
dtime<-as.numeric((data.s[,names[1]][2]-data.s[,names[1]][1]),units="hours") dtime<-as.numeric((data.s[,names[1]][2]-data.s[,names[1]][1]),units="hours")
if(any(ds %% dtime > 10e-16)){ if(any((ds/dtime)%%1 > 10e-8)){
stop('At least one of the given aggregation durations is not multiple of the time resolution = ' stop('At least one of the given aggregation durations is not multiple of the time resolution = '
,dtime,' of station ',station,'.')} ,dtime,'hours at station ',station,'.')}
# function 1: aggregate over single durations and find annual maxima: # function 1: aggregate over single durations and find annual maxima:
agg.ts <- function(ds){ agg.ts <- function(ds){
runsum = RcppRoll::roll_sum(data.s[,names[2]],ds/dtime,fill=NA,align='right') runsum = RcppRoll::roll_sum(data.s[,names[2]],round(ds/dtime),fill=NA,align='right')
#runmean <- rollapplyr(as.zoo(data.s[,names[2]]),ds/dtime,FUN=sum,fill =NA,align='right') #runmean <- rollapplyr(as.zoo(data.s[,names[2]]),ds/dtime,FUN=sum,fill =NA,align='right')
runsum <- runsum/ds #intensity per hour runsum <- runsum/ds #intensity per hour
max.subset <- lapply(1:length(which.mon),function(m.i){ max.subset <- lapply(1:length(which.mon),function(m.i){
...@@ -175,14 +176,16 @@ NULL ...@@ -175,14 +176,16 @@ NULL
return(df) # maxima for single durations return(df) # maxima for single durations
} }
# call function 1 in lapply to aggregate over all durations at single station # call function 1 in lapply to aggregate over all durations at single station
data.agg <- pbapply::pblapply(ds,agg.ts,cl=cl) clust <- parallel::makeCluster(cl)
data.agg <- parallel::parLapply(cl = clust,ds,agg.ts)
parallel::stopCluster(clust)
df <- do.call(rbind,data.agg) df <- do.call(rbind,data.agg)
return(df) # maxima for all durations at one station return(df) # maxima for all durations at one station
} }
# which stations should be used? # which stations should be used?
if(is.null(which.stations))which.stations <- if(is.null(names(data))){1:length(data)}else{names(data)} if(is.null(which.stations))which.stations <- if(is.null(names(data))){1:length(data)}else{names(data)}
# call function 2 in lapply to aggregate over all durations at all stations # call function 2 in lapply to aggregate over all durations at all stations
station.list <- lapply(which.stations,agg.station) station.list <- pbapply::pblapply(which.stations,agg.station)
return(do.call('rbind',station.list)) return(do.call('rbind',station.list))
} }
......
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# IDF
<!-- badges: start -->
<!-- badges: end -->
Intensity-duration-frequency (IDF) curves are a widely used analysis-tool
in hydrology to assessthe characteristics of extreme precipitation.
The package 'IDF' functions to estimate IDF relations for given
precipitation time series on the basis of a duration-dependent
generalized extreme value (GEV) distribution.
The central function is \code{\link{gev.d.fit}}, which uses the method
of maximum-likelihood estimation for the d-GEV parameters, whereby it is
possible to include generalized linear modeling
for each parameter.
For more detailed information on the methods and the application of the package for estimating IDF curves with spatial covariates, see Ulrich et. al (2020, https://doi.org/10.3390/w12113119).
## Installation
You can install the released version of IDF from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("IDF")
```
or from gitlab using:
``` r
devtools::install_git("https://gitlab.met.fu-berlin.de/Rpackages/IDF")
```
## Example
Here are a few examples to illustrate the order in which the functions are intended to be used.
* Step 0: sample 20 years of example hourly 'precipitation' data
```{r sample data}
dates <- seq(as.POSIXct("2000-01-01 00:00:00"),as.POSIXct("2019-12-31 23:00:00"),by = 'hour')
sample.precip <- rgamma(n = length(dates), shape = 0.05, rate = 0.4)
precip.df <- data.frame(date=dates,RR=sample.precip)
```
* Step 1: get annual maxima
```{r annual max}
library(IDF)
durations <- 2^(0:6) # accumulation durations [h]
ann.max <- IDF.agg(list(precip.df),ds=durations,na.accept = 0.1)
# plotting the annual maxima in log-log representation
plot(ann.max$ds,ann.max$xdat,log='xy',xlab = 'Duration [h]',ylab='Intensity [mm/h]')
```
* Step 2: fit d-GEV to annual maxima
```{r fit}
fit <- gev.d.fit(xdat = ann.max$xdat,ds = ann.max$ds,sigma0link = make.link('log'))
# checking the fit
gev.d.diag(fit,pch=1,)
# parameter estimates
params <- gev.d.params(fit)
print(params)
# plotting the probability density for a single duration
q.min <- floor(min(ann.max$xdat[ann.max$ds%in%1:2]))
q.max <- ceiling(max(ann.max$xdat[ann.max$ds%in%1:2]))
q <- seq(q.min,q.max,0.2)
plot(range(q),c(0,0.55),type = 'n',xlab = 'Intensity [mm/h]',ylab = 'Density')
for(d in 1:2){ # d=1h and d=2h
# sampled data:
hist(ann.max$xdat[ann.max$ds==d],main = paste('d=',d),q.min:q.max
,freq = FALSE,add=TRUE,border = d)
# etimated prob. density:
lines(q,dgev.d(q,params$mut,params$sigma0,params$xi,params$theta,params$eta,d = d),col=d)
}
legend('topright',col=1:2,lwd=1,legend = paste('d=',1:2,'h'),title = 'Duration')
```
* Step 3: adding the IDF-curves to the data
```{r idf}
plot(ann.max$ds,ann.max$xdat,log='xy',xlab = 'Duration [h]',ylab='Intensity [mm/h]')
IDF.plot(durations,params,add=TRUE)
```
<!-- README.md is generated from README.Rmd. Please edit that file -->
# IDF
<!-- badges: start -->
<!-- badges: end -->
Intensity-duration-frequency (IDF) curves are a widely used
analysis-tool in hydrology to assessthe characteristics of extreme
precipitation. The package ‘IDF’ functions to estimate IDF relations for
given precipitation time series on the basis of a duration-dependent
generalized extreme value (GEV) distribution. The central function is ,
which uses the method of maximum-likelihood estimation for the d-GEV
parameters, whereby it is possible to include generalized linear
modeling for each parameter. For more detailed information on the
methods and the application of the package for estimating IDF curves
with spatial covariates, see Ulrich et. al (2020,
<https://doi.org/10.3390/w12113119>).
## Installation
You can install the released version of IDF from
[CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("IDF")
```
or from gitlab using:
``` r
devtools::install_git("https://gitlab.met.fu-berlin.de/Rpackages/IDF")
```
## Example
Here are a few examples to illustrate the order in which the functions
are intended to be used.
- Step 0: sample 20 years of example hourly ‘precipitation’ data
<!-- end list -->
``` r
dates <- seq(as.POSIXct("2000-01-01 00:00:00"),as.POSIXct("2019-12-31 23:00:00"),by = 'hour')
sample.precip <- rgamma(n = length(dates), shape = 0.05, rate = 0.4)
precip.df <- data.frame(date=dates,RR=sample.precip)
```
- Step 1: get annual maxima
<!-- end list -->
``` r
library(IDF)
durations <- 2^(0:6) # accumulation durations [h]
ann.max <- IDF.agg(list(precip.df),ds=durations,na.accept = 0.1)
# plotting the annual maxima in log-log representation
plot(ann.max$ds,ann.max$xdat,log='xy',xlab = 'Duration [h]',ylab='Intensity [mm/h]')
```
<img src="man/figures/README-annual max-1.png" width="100%" />
- Step 2: fit d-GEV to annual maxima
<!-- end list -->
``` r
fit <- gev.d.fit(xdat = ann.max$xdat,ds = ann.max$ds,sigma0link = make.link('log'))
#> $conv
#> [1] 0
#>
#> $nllh
#> [1] 73.81918
#>
#> $mle
#> [1] 5.384207e+00 6.400833e-01 -9.082818e-03 5.752210e-09 8.153861e-01
#>
#> $se
#> [1] 3.531081e-01 7.670363e-02 5.422432e-02 2.000074e-06 1.279425e-02
# checking the fit
gev.d.diag(fit,pch=1,)
```
<img src="man/figures/README-fit-1.png" width="100%" />
``` r
# parameter estimates
params <- gev.d.params(fit)
print(params)
#> mut sigma0 xi theta eta
#> 1 5.384207 1.896639 -0.009082818 5.75221e-09 0.8153861
# plotting the probability density for a single duration
q.min <- floor(min(ann.max$xdat[ann.max$ds%in%1:2]))
q.max <- ceiling(max(ann.max$xdat[ann.max$ds%in%1:2]))
q <- seq(q.min,q.max,0.2)
plot(range(q),c(0,0.55),type = 'n',xlab = 'Intensity [mm/h]',ylab = 'Density')
for(d in 1:2){ # d=1h and d=2h
# sampled data:
hist(ann.max$xdat[ann.max$ds==d],main = paste('d=',d),q.min:q.max
,freq = FALSE,add=TRUE,border = d)
# etimated prob. density:
lines(q,dgev.d(q,params$mut,params$sigma0,params$xi,params$theta,params$eta,d = d),col=d)
}
legend('topright',col=1:2,lwd=1,legend = paste('d=',1:2,'h'),title = 'Duration')
```
<img src="man/figures/README-fit-2.png" width="100%" />
- Step 3: adding the IDF-curves to the data
<!-- end list -->
``` r
plot(ann.max$ds,ann.max$xdat,log='xy',xlab = 'Duration [h]',ylab='Intensity [mm/h]')
IDF.plot(durations,params,add=TRUE)
```
<img src="man/figures/README-idf-1.png" width="100%" />
...@@ -25,6 +25,10 @@ CRAN repository db overrides: ...@@ -25,6 +25,10 @@ CRAN repository db overrides:
X-CRAN-Comment: Archived on 2020-03-13 as check problems were not X-CRAN-Comment: Archived on 2020-03-13 as check problems were not
corrected in time. corrected in time.
The maintainer of this package has changed. The former maintainer did not inform us that check problems had occurred.
This submission is a substantially revised version.
## Reverse dependencies ## Reverse dependencies
This is a new release, so there are no reverse dependencies. There are currently no downstream dependencies for this package.
...@@ -11,7 +11,7 @@ IDF.agg( ...@@ -11,7 +11,7 @@ IDF.agg(
which.stations = NULL, which.stations = NULL,
which.mon = list(0:11), which.mon = list(0:11),
names = c("date", "RR"), names = c("date", "RR"),
cl = NULL cl = 1
) )
} }
\arguments{ \arguments{
...@@ -20,7 +20,7 @@ The data.frame must have the columns 'date' and 'RR' unless other names ...@@ -20,7 +20,7 @@ The data.frame must have the columns 'date' and 'RR' unless other names
are specified in the parameter `names`. The column 'date' must contain strings with are specified in the parameter `names`. The column 'date' must contain strings with
standard date format.} standard date format.}
\item{ds}{numeric vector of aggregation durations. \item{ds}{numeric vector of aggregation durations in hours.
(Must be multiples of time resolution at all stations.)} (Must be multiples of time resolution at all stations.)}
\item{na.accept}{numeric in [0,1) giving maximum percentage of missing values \item{na.accept}{numeric in [0,1) giving maximum percentage of missing values
...@@ -33,7 +33,7 @@ containing names of elements in data. If not given, all elements in `data` will ...@@ -33,7 +33,7 @@ containing names of elements in data. If not given, all elements in `data` will
\item{names}{optional, character vector of length 2, containing the names of the columns to be used.} \item{names}{optional, character vector of length 2, containing the names of the columns to be used.}
\item{cl}{optional, number of cores to be used from \code{\link[pbapply]{pblapply}} for parallelization.} \item{cl}{optional, number of cores to be used from \code{\link[parallel]{parLapply}} for parallel computing.}
} }
\value{ \value{
data.frame containing the annual intensity maxima [mm/h] in `$xdat`, the corresponding duration in `$ds`, data.frame containing the annual intensity maxima [mm/h] in `$xdat`, the corresponding duration in `$ds`,
......
...@@ -72,7 +72,7 @@ the components nllh, mle and se are always printed. ...@@ -72,7 +72,7 @@ the components nllh, mle and se are always printed.
duration offset and duration exponent, resp.} duration offset and duration exponent, resp.}
\item{se}{numeric vector giving the standard errors for the MLE's (in the same order)} \item{se}{numeric vector giving the standard errors for the MLE's (in the same order)}
\item{trans}{A logical indicator for a non-stationary fit.} \item{trans}{A logical indicator for a non-stationary fit.}
\item{model}{A list with components mul, sigl, shl, thetal and etal.} \item{model}{A list with components mutl, sigma0l, xil, thetal and etal.}
\item{link}{A character vector giving inverse link functions.} \item{link}{A character vector giving inverse link functions.}
\item{conv}{The convergence code, taken from the list returned by \code{\link{optim}}. \item{conv}{The convergence code, taken from the list returned by \code{\link{optim}}.
A zero indicates successful convergence.} A zero indicates successful convergence.}
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment