Data processing

The beginning of the process is based entirely on the database (.xls file) they made available. However, to achieve a satisfactory result and to be able to insert it into a relational database, the following steps were taken:

I downloaded the original spreadsheet and uploaded it to Google Sheets;
I converted it from .xls (proprietary Microsoft format) to .xlsx (Office Open XML format) for better accessibility;
I cleaned up all the rows, columns, and font formatting. I removed empty rows, merged duplicate rows or rows that were added for better visualization in the study;
Replaced the non-numeric values, where:
- NA (not applicable) was converted to empty (null);
- * (values sent for re-analysis) has been converted to empty (null);
- Tr (values between a specific range) has been converted to zero (0);
I defined English names for the columns;
I created another worksheet (still in the same document) that contains all the possible categories and linked theirs `id’s to the food worksheet;
I created another sheet (still in the same document) that contains nutritional information and linked it to the id of the food;
I exported each spreadsheet in the document to .csv and downloaded it into the project;
I modeled the database using a tool called Prisma;
I created a script to populate the database in the correct order and make the relationship between the information and the food.

Official data

To keep the original research data used for this project, I have saved all the files from the original site and you can consult them in the /references/* folder