Last updated: 2021-05-27
Checks: 7 0
Knit directory:
fa_sim_cal/
This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20201104)
was run prior to running the code in the R Markdown file.
Setting a seed ensures that any results that rely on randomness, e.g.
subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version b85e679. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the
analysis have been committed to Git prior to generating the results (you can
use wflow_publish
or wflow_git_commit
). workflowr only
checks the R Markdown file, but you know if there are other scripts or data
files that it depends on. Below is the status of the Git repository when the
results were generated:
Ignored files:
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: .tresorit/
Ignored: _targets/
Ignored: data/VR_20051125.txt.xz
Ignored: data/VR_Snapshot_20081104.txt.xz
Ignored: renv/library/
Ignored: renv/local/
Ignored: renv/staging/
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made
to the R Markdown (analysis/workflow.Rmd
) and HTML (docs/workflow.html
)
files. If you’ve configured a remote Git repository (see
?wflow_git_remote
), click on the hyperlinks in the table below to
view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | a6fb2e3 | Ross Gayler | 2021-05-27 | WIP |
html | b565810 | Ross Gayler | 2021-04-04 | Build site. |
Rmd | 462213b | Ross Gayler | 2021-03-27 | WIP |
html | a0b6a56 | Ross Gayler | 2021-03-21 | Build site. |
Rmd | d4a106d | Ross Gayler | 2021-03-20 | WIP |
html | ac6b6da | Ross Gayler | 2021-03-04 | Build site. |
Rmd | dbd6fbb | Ross Gayler | 2021-03-04 | Add some useful links to the workflow document |
html | 67e6fdf | Ross Gayler | 2021-03-03 | Build site. |
Rmd | 5b5369f | Ross Gayler | 2021-03-03 | Add workflow management notes |
Rmd | d2d559e | Ross Gayler | 2021-03-02 | end of day |
Rmd | 199d85e | Ross Gayler | 2021-03-02 | end of day |
Rmd | 9ba0dc4 | Ross Gayler | 2021-03-01 | end of day |
Rmd | 55ee0b1 | Ross Gayler | 2021-02-27 | end of day |
This project uses the targets
and workflowr
packages for
managing the workflow of the project (making sure that the dependencies
between computational steps are satisfied). When this work was started
there were no easily found examples of using targets
and workflowr
together. This notebook contains notes on the proposed workflow for
using targets
and workflowr
.
These points reflect my (possibly faulty) understanding of targets
and
workflowr
. If I am wrong here I hope that somebody will see this and
let me know, rather than me having to find out the hard way.
targets
and workflowr
both work by tracking some set of entities
and the computational dependencies between them. When any of the
tracked entities changes, the packages calculate the minimal set of
downstream dependencies that need to be recomputed to bring all the
entities into a consistent state of being up to date.
targets
Supports a computational-pipeline-centric style of analysis
Tracks data objects (including files) and functions.
The focus is on the data transformation by the computational pipeline (rather than human generated text in reports).
Knows about high-performance computing and can run computations in parallel.
workflowr
Supports a notebook-centric style of analysis
Only tracks Rmd notebook files and the corresponding rendered output files (https://github.com/ropensci/tarchetypes/issues/23#issuecomment-749118599)
workflowr::wflow_build()
tracks modification dates of Rmd
files and the corresponding rendered output files
workflowr::wflow_publish()
tracks git status of Rmd files
and the corresponding rendered output files
The computational consistency aspect are really only about the consistency between the notebook Rmd files and their rendered counterparts. That is, `workflowr` really only knows about notebook rendering and not about the computations in those notebooks.
The computational reproducibility aspect is restricted to ensuring that random number seeds are set appropriately, that each notebook is executed in a clean environment, and that the package versions are recorded
Automatic building of a website for the rendered notebooks
Publication of website integrated with git
Automatic publication of website served by GitHub Pages
Comparison of targets
and workflowr
targets
provides more general and fine-grained control of the
computational pipeline
targets
computational pipelinetargets
If I use targets
to manage computational dependency tracking,
what extra capabilities does workflowr
provide?
workflowr
to handle the building and
publication of the project website.Design states and design reasoning
The design state of the computational pipeline (loosely defined) reflects the current best beliefs. That is, any previous design states are believed to be flawed in some way and the current design state is believed to be better.
While the system is being modified from the prior design state to the current design state it is transiently broken and there is no need for that broken state to be preserved and easily accessible later.
Prior design states of the computational are not very interesting (because they are believed to be worse than the current design state) so there is no need for prior design states to be preserved and easily accessible later.
The reasoning behind the current design state is important and must be preserved and immediately accessible.
The reasoning behind the current design state may involve reference to prior design states and prior design reasoning. Where these references are needed they are included directly in the design reasoning for the current design state and are preserved and immediately accessible.
The proposed workflow needs to support my preferences for how to organise a project. In particular, a computational research project necessarily involves many design choices for the computational details. It is my strong preference that the reasoning behind these design choices (which may involve additional empirical work to support the reasoning) is documented as part of the project.
The total workflow of the project has multiple components:
The core is implemented as a standard targets
computational
pipeline.
The publications are implemented as extensions to the core pipeline.
workflowr
notebooks).targets
.The meta components are implemented as short chains hanging off the core pipeline.
There may be computational steps to generate data objects needed specifically for meta documents.
targets
, so
resulting data objects will be cached by targets
.However, there is less pressure to minimise computation in the Rmd documents, because individual design decisions are less likely to need re-execution than the calculations on the core pipeline
The meta publications are workflowr
Rmd notebooks.
The meta-specific data objects and the rendering of the Rmd
documents are managed by targets
.
The meta documents are rendered to a workflowr
website.
The workflowr
website also has links to the rendered
publications.
workflowr
istinguished between building and publishing
documents.
targets
whereas publishing is only invoked
manually.An example is summarised in the following diagram.
The arrows represent data flows (dependencies). These dependencies allow
targets
to work out what is out of date and therefore requiring
re-execution conditional on any of the tracked entities being modified.
The circles represent data objects (R objects and files).
The double circles represent Rmarkdown files. targets
treats them like
any other data object, but I have distinguished them in this diagram.
The triangles represent functions that generate or transform data.
The hexagons represent rendered Rmarkdown files.
The red nodes represent the core pipeline. Data is ingested and repeatedly transformed by functions.
The green nodes represent the publication workflow. There can be multiple publications derived from the core.
A publication may apply functions to core data to generate summaries for inclusion in the publication (e.g. plots, tables).
The text of the publication is in the plain Rmarkdown file.
The publication Rmarkdown file and the data it depends on are knitted to generate the rendered publication.
The gold nodes represent the meta publication workflow. The two dark gold nodes are a special case of the meta publication workflow.
A meta publication typically applies functions to some core data to generate summaries which inform the design reasoning for the next set of functions in the core pipeline. (Other patterns are possible, taking data from multiple core data objects, or even no data at all.)
The gold double circle nodes represent workflowr
Rmarkdown files.
These contain the text of the reasoning behind the design decisions.
The workflowr
Rmarkdown file and the data it depends on are knitted to
generate the rendered meta publication recording the reasoning behind
some design decisions.
There can be many meta publications. They are the documentation of the design of the project.
The two dark gold nodes represent the website part of the meta publication workflow.
The workflowr
index Rmarkdown normally contains links to all the
rendered documents of interest (meta publications and external
publications) and is rendered to become the home page of the project
website.
workflowr
the rendered output file is updated, which invalidates the corresponding target in targets
.
workflowr
notebooks in the corresponding target.
tar_make()
which calls wflow_build()
.
-Publish the document by manually calling wflow_publish()
.
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.10
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
[5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
[7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] DiagrammeR_1.0.6.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 pillar_1.6.1 compiler_4.1.0 bslib_0.2.5
[5] later_1.2.0 RColorBrewer_1.1-2 jquerylib_0.1.4 git2r_0.28.0
[9] workflowr_1.6.2 tools_4.1.0 digest_0.6.27 jsonlite_1.7.2
[13] evaluate_0.14 lifecycle_1.0.0 tibble_3.1.2 pkgconfig_2.0.3
[17] rlang_0.4.11 rstudioapi_0.13 yaml_2.2.1 xfun_0.23
[21] stringr_1.4.0 knitr_1.33 fs_1.5.0 vctrs_0.3.8
[25] sass_0.4.0 htmlwidgets_1.5.3 rprojroot_2.0.2 glue_1.4.2
[29] R6_2.5.0 fansi_0.4.2 rmarkdown_2.8 bookdown_0.22
[33] magrittr_2.0.1 whisker_0.4 promises_1.2.0.1 ellipsis_0.3.2
[37] htmltools_0.5.1.1 renv_0.13.2 httpuv_1.6.1 utf8_1.2.1
[41] stringi_1.6.2 visNetwork_2.0.9 crayon_1.4.1