library(sf)
library(dplyr)
11 Data sets
11.1 Greater Machester land use data
Availability
The dataset is stored on a gpkg file that can be found, within the structure of this project, under:
<- st_read("./data/geodemographics/manchester_land_cover_2011.gpkg") st_LSOA
Reading layer `manchester_land_cover_2011' from data source
`/Users/carmen/Documents/GitHub/r4ps/data/geodemographics/manchester_land_cover_2011.gpkg'
using driver `GPKG'
Simple feature collection with 1673 features and 44 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 351662.3 ymin: 381166 xmax: 406087.2 ymax: 421037.7
Projected CRS: OSGB36 / British National Grid
Variables
The variables included in this dataset follow the land use classification of the CORINE Land Cover dataset.
Source & Pre-processing
The data was sourced from What do ‘left behind’ areas look like over time? and cleaned on Python.
11.2 British administrative boundaries (LSOAs, MSOAs and LAs)
Availability
The dataset for the boundaries of the lower-layer super-output areas (LSOAs) within London is stored as a shapefile that can be found under:
<- st_read("data/geodemographics-old/LSOA_2011_London_gen_MHW/LSOA_2011_London_gen_MHW.shp") st_LSOA
Reading layer `LSOA_2011_London_gen_MHW' from data source
`/Users/carmen/Documents/GitHub/r4ps/data/geodemographics-old/LSOA_2011_London_gen_MHW/LSOA_2011_London_gen_MHW.shp'
using driver `ESRI Shapefile'
Simple feature collection with 4835 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 503574.2 ymin: 155850.8 xmax: 561956.7 ymax: 200933.6
Projected CRS: OSGB36 / British National Grid
Data for the shapes of the MSOAs must be downloaded from UK’s GeoPortal here. Make sure you download the 2021 version and store it in the .\data\machine-learning\
folder as a file with the .gpkg extension. We have not included the file in the GitHub repo due to its large size. You can load it with st_read
and ensure it is in a projection system of choice.
The dataset for the boundaries of the local authority distrits (LADs) for the UK is stored as a shapefile that can be found under:
<- st_read("./data/networks/Local_Authority_Districts_(December_2022)_Boundaries_UK_BFC/LAD_DEC_2022_UK_BFC.shp") LA_UK
Reading layer `LAD_DEC_2022_UK_BFC' from data source
`/Users/carmen/Documents/GitHub/r4ps/data/networks/Local_Authority_Districts_(December_2022)_Boundaries_UK_BFC/LAD_DEC_2022_UK_BFC.shp'
using driver `ESRI Shapefile'
Simple feature collection with 374 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -116.1928 ymin: 5336.966 xmax: 655653.8 ymax: 1220302
Projected CRS: OSGB36 / British National Grid
Variables
For each of the 4,835 LSOAs, the following characteristics are available:
names(st_LSOA)
[1] "LSOA11CD" "LSOA11NM" "MSOA11CD" "MSOA11NM" "LAD11CD" "LAD11NM"
[7] "RGN11CD" "RGN11NM" "USUALRES" "HHOLDRES" "COMESTRES" "POPDEN"
[13] "HHOLDS" "AVHHOLDSZ" "geometry"
where:
LSOA11CD
: Lower-Layer Super-Output Area codeLSOA11NM
: Lower-Layer Super-Output Area codeMSOA11CD
: Medium-Layer Super-Output Area codeMSOA11NM
: Medium-Layer Super-Output Area codeLAD11CD
: Local Authority District codeLAD11NM
: Local Authority District nameRGN11CD
: Region codeRGN11NM
: Region nameUSUALRES
: Usual residentsHHOLDRES
: Household residentsCOMESTRES
: Communal Establishment residentsPOPDEN
: Population densityHHOLDS
: Number of householdsAVHHOLDSZ
: Average household sizegeometry
: Polygon of LSOA
For each of the 374 LADs, the following characteristics are available:
names(LA_UK)
[1] "OBJECTID" "LAD22CD" "LAD22NM" "BNG_E" "BNG_N"
[6] "LONG" "LAT" "GlobalID" "SHAPE_Leng" "SHAPE_Area"
[11] "geometry"
where:
OBJECTID
: object identifierLAD22CD
: Local Authority District codeLAD22NM
: Local Authority District nameBNG_E
: Location EastingBNG_N
: Location NorthingLONG
: Location LongitudeLAT
: Location LatitudeGlobalID
: Global IdentifierSHAPE_Leng
: Boundary lengthSHAPE_Area
: Area within boundarygeometry
: Polygon of LAD
Projection
The shapes of each LSOA are stored as polygons an expressed in the OSGB36 projection:
st_crs(st_LSOA)
Coordinate Reference System:
User input: OSGB36 / British National Grid
wkt:
PROJCRS["OSGB36 / British National Grid",
BASEGEOGCRS["OSGB36",
DATUM["Ordnance Survey of Great Britain 1936",
ELLIPSOID["Airy 1830",6377563.396,299.3249646,
LENGTHUNIT["metre",1]],
ID["EPSG",6277]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",49,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",-2,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.999601272,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",400000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",-100000,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
Similarly, the shapes of each LAD are stored as polygons an expressed in the OSGB36 projection:
st_crs(LA_UK)
Coordinate Reference System:
User input: OSGB36 / British National Grid
wkt:
PROJCRS["OSGB36 / British National Grid",
BASEGEOGCRS["OSGB36",
DATUM["Ordnance Survey of Great Britain 1936",
ELLIPSOID["Airy 1830",6377563.396,299.3249646,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4277]],
CONVERSION["British National Grid",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",49,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",-2,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996012717,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",400000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",-100000,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["United Kingdom (UK) - offshore to boundary of UKCS within 49°45'N to 61°N and 9°W to 2°E; onshore Great Britain (England, Wales and Scotland). Isle of Man onshore."],
BBOX[49.75,-9,61.01,2.01]],
ID["EPSG",27700]]
Source & Pre-processing
The boundaries for the LSOAs within London can be found directly from the London Datastore website.
The boundaries for the LADs for the UK can be found on the ONS Open Geography Portal website. To filter for the London LADs, i.e. the London boroughs, we run the following line of code:
<- LA_UK %>% filter(grepl('E09', LAD22CD)) LND_boroughs
11.3 Twitter migration data for the UK
11.3.1 Availability
The dataset is stored on a gpkg file that can be found, within the structure of this project, under:
<- st_read("./data/networks/internal_migration_uk.csv") st_LSOA
Reading layer `internal_migration_uk' from data source
`/Users/carmen/Documents/GitHub/r4ps/data/networks/internal_migration_uk.csv'
using driver `CSV'
Warning: no simple feature geometries present: returning a data.frame or tbl_df
11.3.2 Source and preprocessing
The data was created for the paper (Wang et al. 2022). The paper includes details on the methodology.
11.4 Worldpop population count data for Ukraine
11.5 Census population count data for UK
11.6 Ukraine’s administrative boundaries
11.7 Twitter data on public opinion originated in the US and in the UK
11.8 Reddit data
11.9 Google mobility data for Italy and the UK
11.10 COVID-19 cases data for London and Rome
11.11 Census MSOA data for England and Wales
Availability
The dataset for the demographic census data of each MSOA in England and Wales can be loaded as a csv file from:
<- read.csv("./data/machine-learning/census2021-msoa.csv") df_MSOA
A dataset for the data on the median rent price for each MSOA can be loaded as a csv as below. This data is from Zoopla and is made available here for non-commercial use, through the Urban Big Data Centre:
<- read.csv("./data/machine-learning/zoopla_mean_rent_msoa.csv") df_rent
Variables
For each of the 7,080 MSOAs recorded in England and Wales, the following fields are available:
names(df_MSOA)
[1] "X" "date" "geography" "geography.code"
[5] "inHH" "inCE" "SING" "MARRIED"
[9] "SEP" "DIV" "WIDOW" "UK"
[13] "EU" "AFR" "AS" "AM"
[17] "OC" "BO" "DENSITY" "Y14orUNDER"
[21] "Y15to19" "Y20to24" "Y25to29" "Y30to34"
[25] "Y35to49" "Y40to44" "Y45to49" "Y50to54"
[29] "Y55to59" "Y60to64" "Y65orOVER" "F"
[33] "M" "HH1" "HH2" "HH3"
[37] "HH4" "HH5" "HH6" "ADD1YagoSAME"
[41] "ADD1YagoSTUDENT" "ADD1YagoUK" "ADD1YagoNONUK" "NHH"
[45] "OWN" "MORTGAGE" "SHAREDOWN" "RENTfromCOUNCIL"
[49] "RENTotherSOCIAL" "RENTprivate" "RENTprivateOTHER" "RENTfree"
For a description of the variables in the columns of df_MSOA, we can load a dictionary for these variables:
<- read.csv("./data/machine-learning/Dictionary.csv")
df_dictionary head(df_dictionary)
Dictionary X
1
2 Name Key
3 Lives in household (% persons) inHH
4 Lives in communal establishment (% persons) inCE
5 Never married or civil partnership (% persons) SING
6 Married or in civil partnership (% persons) MARRIED
Source & pre-processing
Data on the the census characteristics for different MSOAs can be downloaded from the Nomis website. Data on the average net household income can be obtained from the ONS website.
Data on the median houseprice for different MSOAs can be downloaded from the ONS website.
All the data has been pre-processed on Microsoft Excel.