Abstract: This article develops a threshold regression model, where the threshold is determined by an unknown relation between two variables. The threshold function is estimated fully nonparametrically. Since the observations are allowed to be cross-sectionally dependent, our model can be applied to determine an unknown spatial border for sample splitting over a random field. The uniform rate of convergence and the nonstandard limiting distribution of the nonparametric threshold estimator are derived. The root-n consistency and the asymptotic normality of the regression coefficients are also derived. Empirical relevance is illustrated by estimating an economic border induced by the housing price difference between Queens and Brooklyn in New York City, where the economic border deviates substantially from the administrative one.
Abstract: This paper develops textual sentiment measures for China’s stock market by extracting the textual tone of 60 million messages posted on a major online investor forum in China from 2008 to 2018. We conduct sentiment extraction by using both conventional dictionary methods based on customized word lists and supervised machine-learning methods (support vector machine and convolutional neural network). The market-level textual sentiment index is constructed as the average of message-level sentiment scores, and the textual disagreement index is constructed as their dispersion. These textual measures allow us to test a range of predictions of classical behavioral asset-pricing models within a unified empirical setting. We find that textual sentiment can significantly predict market return, exhibiting a salient underreaction-overreaction pattern on a time scale of several months. This effect is more pronounced for small and growth stocks, and is stronger under higher investor attention and during more volatile periods. We also find that textual sentiment exerts a significant and asymmetric impact on future volatility. Finally, we show that trading volume will be higher when textual sentiment is unusually high or low and when there are more differences of opinion, as measured by our textual disagreement. Based on a massive textual dataset, our analysis provides support for the noise-trading theory and the limits-to-arbitrage argument, as well as predictions from limited-attention and disagreement models.