Package datazoom.amazonia use example

The idea of this guide is to show a way to use the package datazoom.amazonia. For example, we’ll use the PPM’s data bases (obtained from IBGE) and from Mapbiomas (Observatório do Clima) in order for us to analyze the amount of cattle heads per hectare of pasture area. In order for us to be able to do this analysis, we can use the functions load_ppm() e load_mapbiomas(), which are included in the package, that import the data directly from the source to our RStudio.

In order to start, it’ll be necessary to install the package datazoom.amazonia (through Github), in case it has not been downloaded before, and upload it. Besides this, we’ll use the package tidyverse for data manipulation.

# install.packages("devtools")
# devtools::install_github("datazoompuc/datazoom.amazonia")
library(datazoom.amazonia)

#install.packages("tidyverse")
library(tidyverse)

The package’s functions are shown with the command help(package = "datazoom.amazonia"). Generally speaking, they follow the pattern load_* followed by the data base’s name.

LOADING DATA

We’ll start loading the following data base from MAPBIOMAS, using the function load_mapbiomas.

data_frame_mapbiomas <- datazoom.amazonia::load_mapbiomas(dataset = "mapbiomas_cover",
                                                         cover_level = "4",
                                                         geo_level = "municipality",
                                                         raw_data = FALSE)

data_frame_mapbiomas <- data_frame_mapbiomas%>%
  filter(year==2019)

Generally, every one of this package’s function follow this pattern.

Besides that, we can charge the dataset livestock_inventory from ppm.

data_frame_ppm <- load_ppm(dataset = "ppm_livestock_inventory",
                           time_period = 2019,
                           geo_level = "municipality",
                           language = "pt",
                           raw_data = FALSE)

## 1 in 27 states...
## 2 in 27 states...
## 3 in 27 states...
## 4 in 27 states...
## 5 in 27 states...
## 6 in 27 states...
## 7 in 27 states...
## 8 in 27 states...
## 9 in 27 states...
## 10 in 27 states...
## 11 in 27 states...
## 12 in 27 states...
## 13 in 27 states...
## 14 in 27 states...
## 15 in 27 states...
## 16 in 27 states...
## 17 in 27 states...
## 18 in 27 states...
## 19 in 27 states...
## 20 in 27 states...
## 21 in 27 states...
## 22 in 27 states...
## 23 in 27 states...
## 24 in 27 states...
## 25 in 27 states...
## 26 in 27 states...
## 27 in 27 states...
## Download Succesfully Completed!

Generally, every one of this package’s function follow this pattern.

Below, we can see the first 5 columns from the resulting table mapbiomas and, following this one, the table from ppm:

year	state	municipality	municipality_code	forest_formation
2019	RO	Alta Floresta D’Oeste	1100015	362622.39
2019	RO	Ariquemes	1100023	135556.17
2019	RO	Cabixi	1100031	35705.03
2019	RO	Cacoal	1100049	136779.98
2019	RO	Cerejeiras	1100056	76447.99
2019	RO	Colorado do Oeste	1100064	28187.34

geo_id	ano	num_v2670	num_v2675	num_v2672
1100015	2019	428976	172	5456
1100023	2019	477665	539	7140
1100031	2019	121798	15	1690
1100049	2019	432640	111	5907
1100056	2019	89884	7	1009
1100064	2019	255696	123	4023

DATA WRANGLING

Next, we’ll do the merge of the data bases, and, afterwards, a simple data manipulation in order for us to generate the targeted variable: (cattle heads) / (pasture area hectares)

class(data_frame_mapbiomas$municipality_code ) <- 'numeric'
class(data_frame_mapbiomas$year) <- 'numeric'

class(data_frame_ppm$geo_id) <- 'numeric'
class(data_frame_ppm$ano) <- 'numeric'

merge <- data_frame_mapbiomas %>%
  full_join(data_frame_ppm, 
            by = c('municipality_code' = 'geo_id', 'year' = 'ano'))

Afterwards, the comand that generates the variable of interest

data <- merge %>%
  mutate(n_bovino_por_area_pastagem = num_v2670/pasture)

After that, we select the desired variables for the final data base.

data <- data %>% 
  select(municipality_code, municipality, state, year, n_bovino_por_area_pastagem) %>%
  arrange(municipality_code, year) %>%
  relocate(municipality_code, year)

APPLICATIONS

We’ll now select the Midwest Region states (Centro-Oeste, in Portuguese) for 2019. Following, we aggregate the variables by State, in order for us to calculate the cattle heads by hectare average ration for each one of them.

data <- data%>%
  filter(state %in%c( "GO" , "MT" , "MS" , "DF"))

data <- data%>%
  group_by(state)%>%
  summarise(average= mean(n_bovino_por_area_pastagem, na.rm= TRUE))

We can use the ggplot2, contained in the tidyverse, to visualize the data in a bar plot.

ggplot(data, aes(x = state, y = average)) +
  geom_col(fill = "#4fdf94", colour = "#00596d")+
  xlab("Estado")+
  ylab("Average cattle heads by hectare area")

At the Data Zoom Amazônia website, you can check other visualizations about other brazillians researche’s data, as well as the data bases covered by our package.

If you need any help or find any problems within the package, please contact us through Github.