### Are Cox Regression Models a Valuable

### Tool for Social Stratification Research

### on Health?

Alessandro Procopio1 Robin Samuel2

(1)_{University of Luxembourg} (2)_{University of Luxembourg}

alessandro.procopio@uni.lu robin.samuel@uni.lu

### Introduction

### The rise of biological and social data

Recent Social Science studies include biomarkers measurements to understand social stratification processes on health outcome(Harris and Schorpp, 2018). At the empirical level, social researchers can rely on an increasing number of

biosocial surveys (National Research Council, 2008).

### Research Question

How to analyze these different types of data?

How to exploit the information provided by these types of surveys?

### Aim of the Study

### Research Strategy

1. Theory-based Monte Carlo Simulation on the Cox regression model

with panel data.

2. Analyze how the model behaves in the context of unobserved

heterogeneity, commmon issue in the Social Sciences.

3. Analyze the misspecification of the time modelling of the biomarker

### Time-Varying Cox Regression Approach

### The classical approach

• The traditional approach to analyze a time-to-event response

variable and a covariate measured over time is to include it as a time-dependent explanatory factor in the model (such as the biomarker trajectory).

• The Cox regression with panel data assumes, however, that the

time-varying covariate (the biomarker) does not change until we get a new measurement. A strong assumption.

• Chen et al. (2004) demonstrated that the Cox regression with

### Proposed solutions

• In a first phase, the Two-Stage Model (Wulfsohn and Tsisatis,

1997) has been implemented. It consists of:

a running a mixed effect model

b predict the trajectory of the biomarker

c include the prediction to a survival model

• Currently, the model we want to propose to analyze social and

biological data is the joint modelling approach.

• The main difference between them is that in the joint modelling the

biomarker trajectory is not included as a prediction of the mixed effect model.

• But the longitudinal and the survival models are estimated

### The model of interest

### Joint modeling

Recently, the statistical literature improved the Two-Stage Model in a way that the mixed and the survival submodels are estimated simultaneously.

Let’s take a look at the two submodels:

### Random Intercept-Slope Submodel

### Monte Carlo Simulation of the Joint Modelling

• Assume that a researcher conducts a study on a sample of 250

respondents over ten years. Let imagine that we have collected biological data through a biosocial survey for a defined m biomarker.

• Let imagine that the biomarker, let say the allostatic load, increases

with age (young people manage stress levels better than the older) and this relationship is non-linear, it has a quadratic pattern.

• Assume that the socioeconomic position influences the level of

### Monte Carlo sets

### The time scale

• In the statistical literature, it is known the Cox regression is sensible

to the time scale specification (Thi´ebaut and B´enichou, 2004;

empirical suggestion taken from Crowther et al., 2016).

• What kind of bias would we find in the estimates if we assume that

the longitudinal trajectory of the biomarker is a linear function with the follow-up time, while it has a quadratic shape in reality?

### Frailty/Heterogeneity

• In the epidemiological and social science literature, between-group

frailties are increasingly taken into account in the data analysis process (for an empirical work: Zarulli et al., 2013).

• What kind of bias would we find in the estimates if we do not take

### Data Generation Mechanism

### Longitudinal Model

mi = .2 + .5(t) + .02(t)2+ .085 ∗ age + 0.1 ∗ ses + eij eij= N (0, Σ) = Σ = σ2 00 σ201 σ211 σ2 00= 2.1 σ112 = 1.07 σ012 =0.3### Gompertz-Cox parametric model

h(t | βi) = exp(−16) + exp(1.5)t

+ exp[.40(β0i+ β1it)

+ .02(t)2+ .085 ∗ age + 0.1 ∗ ses]

### Graphical visualization of the simulated data

-10 0 10 20 30 Longitudinal response -10 -8 -6 -4 -2 0Time before censoring

Censored -10 0 10 20 30 -10 -8 -6 -4 -2 0

Time before event

### Polynomial Trajectory: Correlation coefficient

ρ when U.H.=0.1 .0644 .0646 .0648 .065 .0652 .0654Empirical Standard Errors

.28 .3 .32 .34 .36

Estimate (without heterogeneity)

.0644 .0646 .0648 .065 .0652 .0654 .28 .3 .32 .34 .36

Estimate (with heterogeneity)

ρ when U.H.=3

.0645 .065 .0655 .066

Empirical Standard Errors

.28 .3 .32 .34 .36

Estimate (without heterogeneity)

.064 .0642 .0644 .0646 .0648 .1 .15 .2 .25 .3

### And the association parameter

α when U.H.=0.1 .026 .028 .03 .032 .034 .036Empirical Standard Errors

.45 .5 .55 .6

Estimates (without heterogeneity)

.025 .03 .035

.45 .5 .55 .6

Estimates (with heterogeneity)

α when U.H.=3 .026 .028 .03 .032 .034

Empirical Standard Errors

.45 .5 .55 .6

Estimates (without heterogeneity)

.026 .028 .03 .032 .034 .45 .5 .55 .6

### Linear trajectory: correlation coefficient

ρ when U.H.=0.1 .0644 .0646 .0648 .065 .0652 .0654Empirical Standard Error

.28 .3 .32 .34 .36

Estimate (without heterogeneity)

.0644 .0646 .0648 .065 .0652 .0654 .28 .3 .32 .34 .36

Estimate (with heterogeneity)

ρ when U.H.=3 .0646 .0648 .065 .0652 .0654 .0656

Empirical Standard Error

.28 .3 .32 .34 .36

Estimate (without heterogeneity)

.064 .0642 .0644 .0646 .0648 .1 .15 .2 .25 .3

### And the association parameter

α when U.H.=0.1 .026 .028 .03 .032 .034 .036Empirical Standard Error

.45 .5 .55 .6

Estimate (without heterogeneity)

.025 .03 .035

.45 .5 .55 .6

Estimate (with heterogeneity)

α when U.H.=3 .026 .028 .03 .032 .034

Empirical Standard Error

.45 .5 .55 .6

Estimate (without heterogeneity)

.026 .028 .03 .032 .034 .45 .5 .55 .6

### Conclusions

• The association parameter ρ that captures the correlation between

the fixed and random effects is on average around the true model.

• However, stability toward the true parameter over the replications

present higher variance and bigger empirical standard errors.

• The α parameter, that captures the association between the

biomarker trajectory and survival chances, presents a smoother linear pattern than the longitudinal ρ.

• That means that the empirical standard errors are much narrower to

the estimate.

• Moreover, it is rather ”robust” to unobserved heterogeneity.

• The only problematic set, coherently with previous studies arises

### References I

Bender, R., Augustin, T., and Blettner, M. (2005). Generating survival times to simulate Cox proportional hazards models. Statistics in Medicine, 24(11):1713–1723.

Chen, F., Paxton, P., Bollen, K. A., Curran, P. J., and Kirby, J. (2004). Monte Carlo Experiments: Design and Implementation. Structural Equation Modeling: A Multidisciplinary Journal, 8(2):287–312.

Crowther, M. J., Andersson, T. M., Lambert, P. C., Abrams, K. R., and Humphreys, K. (2016). Joint modelling of longitudinal and survival data: Incorporating delayed entry and an assessment of model misspecification. Statistics in Medicine, 35(7):1193–1209.

Harris, K. M. and Schorpp, K. M. (2018). Integrating Biomarkers in Social Stratification and Health Research. Annual Review of Sociology, 44(1):361–386.

National Research Council (2008). Biosocial Surveys. The National Academies Press, Washington, DC.

### References II

Rizopoulos, D., Verbeke, G., and Molenberghs, G. (2008). Shared parameter models under random effects misspecification. Biometrika, 95(1):63–74. Thi´ebaut, A. C. and B´enichou, J. (2004). Choice of time-scale in Cox’s model

analysis of epidemiologic cohort data: A simulation study. Statistics in Medicine, 23(24):3803–3820.

Van den Hout, A. and Muniz-Terrera, G. (2016). Joint models for discrete longitudinal outcomes in aging research. Journal of the Royal Statistical Society: Series C (Applied Statistics), 65(1):167–186.

Wulfsohn, M. S. and Tsisatis, A. A. (1997). A Joint Model for Survival and Longitudinal Data Measured with Error. Biometrics, 53(1):330–339. Zarulli, V., Marinacci, C., Costa, G., and Caselli, G. (2013). Mortality by

### Appendix:

### K-M Survivor Functions

0.00 0.25 0.50 0.75 1.00 4 6 8 10 Analysis TimeHigher Class Medium Class Lower Class 0.00 0.25 0.50 0.75 1.00 4 6 8 10 Analysis Time