5 comments
The problem I’ve been having lately is with matrix arithmetic, specifically multiplying rows of arrays. Seems to start with R thinking my dataframe is a list instead of a DF which doesn’t make a lot of sense to me. For example, I might want to multiply a row of population data by a column of income distributions to calculate the share of population in each category.
I’m a beginner with Census data; my biggest problem is always just finding the data I need. Hopefully Census Bureau workers won’t have that problem!
When first getting started it was hard to get to the roll-up columns. They’re there most of the time, but often they’re parsed out in many different sub-columns (i.e. by age, gender, etc) and I had do some digging to find the right columns. Some sort of list of “most commonly used data elements” would’ve been super helpful.
Also, I typically wind up mapping the census data back to my internal data by city/state and there’s often much cleanup to get the joins working, but that’s just part of working with data…not sure there’s anything to be done with that.
I use the Census data quite a bit, and API-type fetches have helped simplify a lot of my challenges. Anything in this direction is super helpful.
My largest challenge with census data was navigating the huge number of variable names in surveys such as the ACS. I learned that carefully reading the survey questionnaire is a much better approach than attempting to scroll through the endless list of variables that are segmented to provide useful information. Also, understanding how the more accurate CPS population data is joined with other surveys instead of the ACS population data. Including residual or error data fields and including them in reports. The tradeoff between larger administrative levels to get larger sample sizes for recent data and using a larger span of time to overcome small sample sizes for smaller administrative levels ( smaller than counties, PUMA, congressional districts )