Web scraping; example of extracting data from a long, messy & unstructured PDF file

In this post I am presenting an example of how, using web scrapping and data carpentry techniques, I extracted data from a messy and unstructured PDF published by the Spanish Government. See the repository of the code in GitHub. In early 2021, the Government of Spain made public the list of all the properties that the Catholic Church registered between 1998 and 2015. The lists were published in PDF formatted documents of more than 3,000 pages.

2020: a recap

Although uncommon, 2020 was still a very intense year in which I kept learning and contributing to enhancing the use of data innovations to inform and improve the implementation of development programmes and initiatives around the globe. This post intends to summarize my work in 2020: countries, organizations, and projects I worked for, my experience working remotely, blog posts that I wrote, and the top 3 lessons I learned.

Data visualization; ggplot I

Throughout 2020, while working on visualizations, I collected useful tricks and ideas to make good looking ggplot charts. Unitl today, this library of useful sources was living at my WhatsApp’s notes but I thought it would be easier to have it all in a single repository. So here I am organizing a year of links that I have been copying into my notes. Content: Density plots Style the legend’s colour bar Reordering & facetting Cowplot & title styling Waffle charts Fonts Density plots I tend to use scatter plots when looking to the relationship between two numeric variables.

Keeping my CV up to date with Rmarkdown

Keeping my CV up to date became a tedious routine. Not only did I have to repeat the same steps over and over again but I also had to deal with the clunky formatting of Microsoft Word to mantain the style of my resume nice and neat. On my daily work I have to come up with efficient ways to automatize data processes: collect, clean, analyze and vizualize data in a way that the cycle can be repeated without the workflow crashing.

Línea mujeres

Las llamadas categorizadas como “violencia de género intrafamiliar” se han triplicado durante la Jornada Nacional de Sana Distancia en Ciudad de México. La Jornada Nacional de Sana Distancia que ha estado vigente en México desde el 23 de Marzo de 2020 recomienda a la población quedarse en casa para evitar contagios por coronavirus. Como resultado de la campaña, una buena parte de la sociedad mexicana ha respondido en redes sociales con los ya famosos #QuedateEnCasa o #YoMeQuedoEnCasa.

Los invisibles

Hasta el 21 de abril de 2020 51,634 personas han sido testadas por coronavirus en México. De estas, solamente el 1% habla alguna lengua indígena. Por cada 100,000 personas que residen en municipios de mayoría indígena, a tan solo 4.9 se les ha realizado el test de diagnóstico, frente a casi 45 en municipios de mayoría no indígena. El gobierno mexicano proporciona información del seguimiento al coronavirus diariamente. Los datos incluyen información sobre el número de tests realizados, casos confirmados y fallecimientos.

Tracker of confirmed cases for covid19 in Mexico

This interactive map is a personal and voluntary project that aims to track confirmed cases of #covid19 in each Mexican municipality. The map automatically updates in real-time using official data from the Mexican government. Feel free to enter the map, click on a municipality to see how many of its residents have tested positive for covid19, or look at the charts that show the daily evolution of covid19 in Mexico

Set up a website with R using blogdown, GitHub, and DigitalOcean

I am writting this because it took me some long nights and days to try to figure out how to get started with blogdown. Hopefuly this post helps someone just like these sources have helped me: Yihui Xie’s fantastic blogdown book, Alison Hill’s amazing step by step, and, Collin Quirk’s super ueful guide to upload blogdown into DigitalOcean In GitHUB Create a repositary, name it as you want and enable a README Click on the green buton “Clone or download” In your terminal Open git bash, select the local directory where you want to clone the GitHub repo git clone https://github.

About me

Me I am passionate about data visualization, interactive maps, and the use of data innovative approaches to facilate ongoing learning and adaptation of development programmes. I have lived in: Mexico, Canada, Spain, US, Sweden, and UK. My professional experience expands over more than 26 projects across 21 countries worldwide: Austria, Brazil, China, Colombia, Ecuador, Ghana, India, Kenya, Madagascar, Malawi, Mexico, Mozambique, Nigeria, Peru, Sierra Leone, South Africa, Uganda, United Kingdom, United States, Venezuela, and Vietnam.