Last updated: 2021-05-27

Checks: 2 0

Knit directory: fa_sim_cal/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version b85e679. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    .tresorit/
    Ignored:    _targets/
    Ignored:    data/VR_20051125.txt.xz
    Ignored:    data/VR_Snapshot_20081104.txt.xz
    Ignored:    renv/library/
    Ignored:    renv/local/
    Ignored:    renv/staging/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/index.Rmd) and HTML (docs/index.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd b85e679 Ross Gayler 2021-05-27 wflow_publish(c(“analysis/proposal.Rmd”, “analysis/notes.Rmd”,
html ab90fe6 Ross Gayler 2021-05-18 WIP
Rmd 20a5de5 Ross Gayler 2021-05-16 WIP
html 41ccc1d Ross Gayler 2021-04-04 Build site.
Rmd 6421aeb Ross Gayler 2021-04-04 wip
Rmd 411de1e Ross Gayler 2021-04-04 WIP
html 411de1e Ross Gayler 2021-04-04 WIP
Rmd 0bd4a5f Ross Gayler 2021-04-03 WIP
html 0bd4a5f Ross Gayler 2021-04-03 WIP
Rmd 9b4272d Ross Gayler 2021-04-02 WIP
html 9b4272d Ross Gayler 2021-04-02 WIP
Rmd ec5d588 Ross Gayler 2021-03-30 WIP
Rmd ebd787e Ross Gayler 2021-03-28 WIP
html ebd787e Ross Gayler 2021-03-28 WIP
Rmd ef1be3d Ross Gayler 2021-03-27 WIP
html ef1be3d Ross Gayler 2021-03-27 WIP
html a0b6a56 Ross Gayler 2021-03-21 Build site.
html 6b6e336 Ross Gayler 2021-03-21 Build site.
html 630b399 Ross Gayler 2021-03-21 Build site.
Rmd 2541c14 Ross Gayler 2021-03-21 wflow_publish(c(“analysis/m_00_status.Rmd”, “analysis/index.Rmd”))
html 16d8789 Ross Gayler 2021-03-21 Build site.
Rmd e59c7a3 Ross Gayler 2021-03-21 wflow_publish(“analysis/index.Rmd”)
html b9b0243 Ross Gayler 2021-03-21 Build site.
Rmd 3fc56d8 Ross Gayler 2021-03-21 wflow_publish(“analysis/index.Rmd”)
Rmd dcdc6a2 Ross Gayler 2021-03-05 WIP
html 0a8fa7e Ross Gayler 2021-03-03 Build site.
Rmd 4260c26 Ross Gayler 2021-03-03 Add section header for overview documents in index
html 67e6fdf Ross Gayler 2021-03-03 Build site.
Rmd 5b5369f Ross Gayler 2021-03-03 Add workflow management notes
Rmd 55ee0b1 Ross Gayler 2021-02-27 end of day
html 0d30c5b Ross Gayler 2021-01-26 Build site.
html 23d740f Ross Gayler 2021-01-24 Build site.
Rmd 6df8db7 Ross Gayler 2021-01-24 End of day
html 0c48ca5 Ross Gayler 2021-01-17 Build site.
Rmd 3052780 Ross Gayler 2021-01-17 Add 02-1 block vars
html 0405e0b Ross Gayler 2021-01-15 Build site.
Rmd 00a9ff4 Ross Gayler 2021-01-15 wflow_publish(c(“analysis/index.Rmd”))
html 5ab5dc4 Ross Gayler 2021-01-15 Build site.
Rmd c674a51 Ross Gayler 2021-01-15 Add 01-6 clean vars
html c674a51 Ross Gayler 2021-01-15 Add 01-6 clean vars
Rmd 874917f Ross Gayler 2021-01-13 Revise conclusion re inserted 5 in name
html c0c5313 Ross Gayler 2021-01-13 Build site.
html 22f0b81 Ross Gayler 2021-01-13 Build site.
Rmd d3deb84 Ross Gayler 2021-01-13 Add 01-5 check name
html 44538d8 Ross Gayler 2021-01-12 Build site.
html 8cd5fa1 Ross Gayler 2021-01-12 Build site.
html abb201f Ross Gayler 2021-01-12 Build site.
Rmd 2ae8660 Ross Gayler 2021-01-12 Add 01-4 check demog
html cb9bf70 Ross Gayler 2021-01-12 Build site.
Rmd 84d53a0 Ross Gayler 2021-01-12 Add 01-3 check resid
html 6edddf5 Ross Gayler 2021-01-12 Build site.
Rmd 6469262 Ross Gayler 2021-01-12 Add 01-2 check admin
html 6469262 Ross Gayler 2021-01-12 Add 01-2 check admin
html 4a8c170 Ross Gayler 2021-01-10 Build site.
Rmd 9c13ca8 Ross Gayler 2021-01-10 wflow_publish(“analysis/index.Rmd”)
html 03ad324 Ross Gayler 2021-01-05 Build site.
Rmd b03a2c1 Ross Gayler 2021-01-05 wflow_publish(c(“analysis/index.Rmd”, “analysis/notes.Rmd”))
html 80b360b Ross Gayler 2021-01-04 Build site.
html 856a513 Ross Gayler 2021-01-04 Build site.
html 838463a Ross Gayler 2020-12-23 Build site.
html a618d9e Ross Gayler 2020-12-23 Build site.
Rmd c6390cc Ross Gayler 2020-12-23 wflow_publish("analysis/*.Rmd")
html 36ccc82 Ross Gayler 2020-12-13 Build site.
Rmd f0a165a Ross Gayler 2020-12-13 End of day
html d5eb60b Ross Gayler 2020-12-10 Build site.
html 01b669c Ross Gayler 2020-12-10 Build site.
html 1993afa Ross Gayler 2020-12-10 Build site.
Rmd 9ea6d8a Ross Gayler 2020-12-10 Fix figure captions
html bc8c1cc Ross Gayler 2020-12-06 Build site.
Rmd c99ceff Ross Gayler 2020-12-06 First draft of proposal
html 2f9886a Ross Gayler 2020-12-05 Build site.
html 5f37c79 Ross Gayler 2020-11-30 Build site.
Rmd c2e37f3 Ross Gayler 2020-11-30 Initial ndex.Rmd
html c2e37f3 Ross Gayler 2020-11-30 Initial ndex.Rmd
Rmd 2a722d0 Ross Gayler 2020-11-29 end of day
html 2a722d0 Ross Gayler 2020-11-29 end of day
html 03b0a02 Ross Gayler 2020-11-04 Build site.
Rmd e163b3b Ross Gayler 2020-11-04 Start workflowr project.

This is the website for the research project “Frequency-Aware Similarity Calibration”.

If you have cloned the project to a local computer this website is rendered in the docs subdirectory of the project directory.

If you are using workflowr to publish the research website it will also be rendered online to GitHub Pages.

This page acts as a table of contents for the website. There are links to the web pages generated from the analysis notebooks and to the rendered versions of manuscripts/documents/presentations.


Project Workflow Status

This notebook displays the computational status of the project, that is, whether everything is up to date. Ironically, this is currently the only notebook that must be run manually, so this notebook only displays the status when the notebook was last executed and there is no indication whether the project status has changed since then.


Overview documents

Proposal

This notebook explains the central ideas behind the project.

Notes

This notebook is for keeping notes of any points that may be useful for later project or manuscript development and which are not covered in the analysis notebooks or at risk of getting lost in the notebooks.

Workflow management

This project uses the targets and workflowr packages for managing the workflow of the project (making sure that the dependencies between computational steps are satisfied). When this work was started there were no easily found examples of using targets and workflowr together. This notebook contains notes on the proposed workflow for using targets and workflowr.


Publications

Links to rendered manuscripts and presentations will go here.


META Notebooks

These notebooks capture the analyses that were carried out to develop the code of the core processing pipeline. They are organised as side-chains to the core processing pipeline.

Typically, a meta notebook will analyse the data available at one stage of the core pipeline, to guide the writing of the functions required to get to the next stage of the core pipeline. These meta notebooks generally conclude with the definition of a function that will be used in the core pipeline.

There may be multiple notebooks all relating to different aspects of the same stage of the core pipeline.

Sometimes the analyses are more diffuse - characterising the data in a way that may be helpful for guiding the development of future core stages, but not immediately resulting in the development of functions for the core pipeline.

01 Read, check, and standardise the entity data

Determine the initial data preparation of the imported entity records.

m_01_1_read_raw_entity_data

Read the raw entity data.

m_01_2_exclusions

Apply the row exclusions (exclude test records and all records with status other than ACTIVE & VERIFIED).

m_01_3_drop_novar

Drop variables with no variation.

m_01_4_parse_dates

Parse the date columns.

m_01_5_check_admin

Check the administrative variables (data relating to the administration of voter registration).

m_01_6_check_resid

Check the residence variables - residential address and phone number.

m_01_7_check_demog

Check the demographic variables - sex, age, and birth place.

m_01_8_check_name

Check the name variables.

m_01_9_clean_vars

Clean all the variables.


02 Blocking variables

Examine the distributions of potential blocking variables.

02-1 Characterise blocking variables

Characterise the potential blocking variables and combinations of variables.

02-2 Make blocking variables

Construct the most promising potential combination blocking variables.


03 Name frequency (equality)

Detailed examination of the distributions of name frequencies induced by the string equality relation.


04 Name frequency (similarity)

Detailed examination of the distributions of name frequencies induced by a string similarity relation.


05 Similarity calibration

Detailed examination of the calibration from similarity to probability of identity match, both unconditionally and as a function of name frequency.


06 Compatibility models

Estimate multivariate compatibility models.