Can big data increase our knowledge of local rental markets? A dataset on the rental sector in France

Guillaume Chapelle; Jean Benoît Eyméoud

doi:10.1371/journal.pone.0260405

. 2022 Jan 27;17(1):e0260405. doi: 10.1371/journal.pone.0260405

Can big data increase our knowledge of local rental markets? A dataset on the rental sector in France

Guillaume Chapelle ^1,^2,^*,^#, Jean Benoît Eyméoud ^3,^2,^#

Editor: Nils Kok⁴

PMCID: PMC8794157 PMID: 35085260

Abstract

Social Scientists and policy makers need precise data on market rents. Yet, while housing prices are systematically recorded, few accurate data sets on rents are available. In this paper, we present a new data set describing local rental markets in France based on online ads collected through to webscraping. Comparison with alternate sources reveals that online ads provide a non biased picture of rental markets and allow coverage of the whole territory. We then estimate hedonic models for prices and rents and document the spatial variations in rent-price ratios. We show that rents do not increase as much as prices in the tightest housing markets. We use our dataset to estimate the market rent of each transaction and of social dwellings. In the latter case,this allows us to estimate the in-kind benefit received by social tenants which is mainly driven by the level of private rent in their municipality.

Introduction

Having precise knowledge of local rental markets has been a growing interest for policy makers and researchers. For example, while housing bubbles are a recurrent concern of macro-prudential authorities [1], these phenomena can appear locally as in Paris in 1991 and their identification requires data on both local housing prices (selling price) and returns (rental price). Moreover, having precise knowledge of local rents might be important for taxation purpose. In France, local taxes such as housing or property taxes should be based on the rental value of dwellings. However, the current tax base was estimated in the 1970s and revisions implemented ever since have been homogeneous. As the divergence between local markets or even neighborhoods has not been accounted for, this has led to a strong discrepancy between the tax base and the true market value with significant redistributive consequences [2, 3]. The 2020 Finance Law provides that rental values have to be reviewed before 2026, requiring the collection of additional data for this purpose. More broadly, rental data might also be needed to assess and compare the cost of living in different cities or neighborhoods [4].

Between December 2015 and January 2018, we periodically collected, cleaned and analyzed housing rental ads from the two largest French real estate websites. These two websites were the leaders in the market with a monthly stock of ads on the rental market oscillating between 500,000 and 750,000 [5–7]. Each ad provides the location of the housing good as well as its hedonic characteristics, offering the possibility to describe local housing markets and to estimate local rent indices. In this paper we describe the method used to collect these online data and describe the data-set. We then confront these data with a more conventional data collection method. We do not find any significant differences between the rent measured from online ads and the average rent measured with surveys. We estimate hedonic indices for prices and rents and systematically document the spatial variation in the rent-price ratio. We show that rents does not increase as much as prices in the tightest housing markets. Lastly, we use our dataset to estimate the market rent of each transaction and social dwellings. In the latter case, market rent can be used to measure the in-kind benefit received by social tenants which is mainly driven by the level of private rent in the municipality.

Background

Lack of data on the French rental market

In France, housing prices are recorded by the fiscal administration as the transaction is taxed. However, the rental market is currently mainly studied by the National Statistical Institute which produces the French Housing Survey [8] and the Survey on Rents and Housing Expenditures [9]. Both provide good quality data on the rental sector but they have two drawbacks. First, they are only representative at the national level as they have a limited number of observations. The French housing survey has 36 000 households but only 2 947 tenants in the private sector, the Survey on Rents and Housing Expenditures 4300 households. Thus, they cannot be used to monitor the rental dynamics of a city or an urban area. Second, they do not allow monitoring of the market of new leases, but are representative of the whole rental sector where rent revision is regulated. Another dataset exists at the family branch of Social Security, which is collecting information provided by recipients of housing allowances. However, this dataset has limited use, as it does not provide characteristics of the dwelling besides the municipality and the rent.

This lack of information at the local level led to three initiatives: the Observatoire des Loyers de l’Agglomération Parisienne (OLAP) [10], followed by local observatories, Observatoires Locaux des Loyers (OLL) [11] supported by the French Ministry of Housing. In parallel, Connaître Les Loyers et Analyser les Marchés sur les Espaces Urbains et Ruraux (CLAMEUR) [12] was also created. The OLAP is publicly supported and was first in charge of observing rents in the urban area of Paris while progressively extending its survey to the main French urban areas. It produces two micro level data sets: a panel data set and a time series of yearly cross sectional observations from 1990 until nowadays. Even though these data sets are of good quality, they also present two main limits: they only cover a limited share of the French territory and their access to researchers appears relatively difficult. To our knowledge there exists only a single published study based on this data set [13]. More recently, new local rent observatories have been developed in several urban areas. However, the number of urban areas covered remains limited as there are only 35 local observatories, including the OLAP in Paris. The third initiative, CLAMEUR, collects rental data from real estate agents and insurance companies. It provides a yearly average of the rent per square meter for about 887 French municipalities and groups of municipalities (Etablissements Publics de Coopération Intercommunale). If such a source provides useful information on local markets condition and their dynamics, few details concerning the variables available are provided in their databases. To our knowledge no academic paper has ever used their micro-level data. Moreover, their data have limited geographical coverage (887 out of 36 000 municipalities). Lastly, many websites provide estimates of local rents and prices but their estimates are heterogeneous and suffer from methodological opacity [14].

These aforementioned sources might present some leads to dealing with the limited knowledge of local rental markets. However, these surveys or administrative database require an important and potentially costly treatment to increase the number of observations (for the survey-based data) or the number of variables (for the administrative data) and might not be available to researchers. On the other hand, online ads can present a promising way to document the variations in local rents as illustrated in the US with Craigslist [15].

The growing coverage of real estate websites on the housing market

Housing surveys show that an increasing share of private tenants find their accommodations on real estate websites. Nowadays a vast majority of private landlords or real estate agencies use the internet to find tenants as illustrated in Table 1. Even if these channels do not constitute the whole market, as 22% of the tenants found their flat by alternate channels. Namely 19% by word of mouth, 1% from the employer and 2% from social services. We may be able to observe the vast majority of the market. From our perspective, exploiting housing advertisements posted online can provide an interesting and complementary way to survey local housing markets at a moderate cost.

Table 1. Method used to find a flat in the rental sector (%).

	Not Furnished	Furnished	Total
Privately (ads on internet or Newspapers)	37	42	37
Real Estate Agency	41	22	39
by word of mouth	19	20	19
From the employer	1	3	2
Social Services	2	10	3
Others	0	3	1
Total	100	100	100

Open in a new tab

Source: Author’s computation from the French Housing Survey 2013 (INSEE). Households in the private rental sector installed for less than 4 years.

The growing role of user-generated content in research

Online data and ads have been increasingly used in research, particularly to study the real estate market. For example, a series of papers have exploited online ads to investigate the impact of energy efficiency labels on housing prices [16, 17]. Other papers used these data to study the impact of rent control [18], describe the impact of Airbnb [19–21] or evaluate the incidence of taxation [22]. Ads can also be used to follow real estate dynamics [23] or to compare housing prices across countries [24]. However, while online data are increasingly popular for studying real estate market dynamics, extremely few papers have studied their reliability [14, 15].

One key advantage of online data is their reduced cost, the high number of observations available and their high coverage and granularity when compared with standard survey data. In 2012, the total budget dedicated to the French Local Rent Observatories to follow 19 urban areas was 2.5 Million euros [25, 26]. Although webscraping can allow the gathering of a large number of observations with a homogeneous method and for a limited cost, it is not always possible to access these datasets easily as public APIs are not always available and data might also be protected by copyrights. Nevertheless, this paper documents the reliability of online ads that can be accessed through specific agreements with online platforms.

Materials and methods

Main dataset

Scraping process

There are several real estate websites in France. To gain access to the biggest source of data we decided to focus on the two largest. We used their public API which displayed no restriction of use. The first had about half of its posts from landlords and half from real estate agents. The second had mostly ads from real estate agents. The information we wanted to extract consisted of a set of posts available on the rental websites. Each ad has a unique identifier, pictures, a short text describing the offer and a standardized table presenting the most important characteristics, such as the surface, the number of rooms, the monthly rent, or the type of contract (furnished or not). It is also localized thanks to the name of the municipality, a zipcode and a map indicating the geographic coordinates which can be more or less precise (city level, neighborhood or address). The non-structured part of the ad (description) allows us to identify key words in order to find additional information, such as the presence of an elevator, the storey, and the amount of extra expenditures.

To obtain the data from the two websites, we created programs that sent requests for every municipality in France to their public API and stored the information sent in a dataset. We then cleaned the dataset and structured it to yield a structured format for each post. Lastly, we saved the database in comma-separated values format or SQL database. Overall, the operation took between 10 hours and 2 days, depending on the website and the period of time. We repeated the process of scraping every month for each website from December 2015 until January 2018 and ended up with a database of 4.3 million posts from the rental sector.

Cleaning the data

The cleaning procedure starts by identifying repeated posts that had the same identifier between each wave. We also identify similar posts between both sites using the post’s description. We keep only one observation per ad and keep the number of occurrences of the post. Other papers [23] use a machine learning algorithm to identify similar ads with different identifier. In our approach, we generate a full set of variables describing each housing unit and consider that a unit with the same price and the same characteristics (surface, number of rooms, amenities and geocoding) posted in the same month are duplicates. Once the duplicates dropped, for the sake of comparability with the local observatories data, we follow the recommendation of the National Rent Observatory and keep observations with the rent between 80 and 15 000 euros and the surface between 5 and 500 square meter. As the main point of our study is to have geolocalized data, we keep only observations that provide a city name or a precise geographic location. This procedure creates the final database, which we describe in the next section. In second part of the paper, hedonic models are then estimated removing also the outliers based on the rent per square meters (see S7 Table in S1 Appendix).

Overall, our cleaning procedure decreases the number of observations by 12.87%. The largest part of this decrease is explained by observations that do not report a surface. We believe that the price per square meter is the relevant parameter to characterize the housing market, as it provides a rental value used in other countries and is easily comparable.

Creation of the variables

Overall our cleaned database has about 4.3 millions observations collected between December 2015 and January 2018. The following section describes the different variables and their origins.

The average rent per square meter. Online ads directly provide the location of the dwelling. A total of 60% of the ads are located at the broadest level (French municipalities) while 40% remaining are precisely geocoded at the address or neighborhood level using the information provided by the user or the location of the device used by when creating the ad. This database thus provides fine grain data, as even municipalities remain quite small [4]. This allows us to compute the average gross rent per square meter for the majority of the municipalities in France as illustrated in Fig 1 and presents easy identification of the main urban areas and the places close to the frontiers where rents are usually higher. The gross rent is directly coded and easy to recover. In Table 2, one can observe that the average gross rent is about 650 euros while the rent per square meter is around 13 euros. Both websites also provide additional information specifying whether the rent displayed includes extra expenditures (e.g., waste collection, water, heating). About 70% of the rent displayed includes some kind of extra expenditures. Unfortunately, the share of the rent attributed to these is not directly coded and is recovered from the text using regular expressions. The algorithm identifies whether the word “charges” is in the text and recovers the amount in euro around this word that is inferior to the rent. About 30% of the ads indicates the amount of extra expenditures. The average estimated amount of extra expenditures on the subsample is around 61.8 euros which represents 9% of the average rent. Moreover, we also estimate the average and median amount of extra expenditures for all the flats using the average amount of extra expenditures for dwellings with similar characteristics (based on the type of dwellings and number of rooms) in the same department for observations where the extra expenditures could not be recovered. This also allows us to estimate an average net rent per square meter for all municipalities and strata.

Fig 1 — *Note:* Author’s computations and ADMIN EXPRESS COMMUNE which is under an Open Licence Etalab https://www.data.gouv.fr/fr/datasets/admin-express/.

Table 2. Price, surface, expenditures and type of lease.

	Count	Mean	Std	Min	25%	50%	75%	Max
Gross Rent	4370365	652.3	405.2	81.0	449.0	567.0	740.0	24000.0
Surface	4370365	55.7	31.1	6.0	33.8	50.0	70.0	497.0
Gross Rent per square meter	4370365	13.6	7.3	0.2	8.9	11.8	16.2	1846.2
Time elapsed since publication (days)	4370365	29.4	39.0	0.0	8.0	20.0	38.0	770.0
Expenditures (%): Included	4370365	72.5	44.7	0.0	0.0	100.0	100.0	100.0
Expenditures (%): Not Included	4370365	5.8	23.3	0.0	0.0	0.0	0.0	100.0
Expenditures (%): Unknown	4370365	21.7	41.2	0.0	0.0	0.0	0.0	100.0
Amount Expenditures	623373	61.8	60.1	0.0	30.0	48.0	80.0	3705.0
Collective heating (%)	4370365	3.6	18.5	0.0	0.0	0.0	0.0	100.0
Hot water (%)	4370365	0.2	4.2	0.0	0.0	0.0	0.0	100.0
Trash collection (%)	4370365	4.6	20.9	0.0	0.0	0.0	0.0	100.0
Furnished (%): No	4370365	75.8	42.8	0.0	100.0	100.0	100.0	100.0
Furnished (%): Yes	4370365	24.2	42.8	0.0	0.0	0.0	0.0	100.0

Open in a new tab

Based on the text, it is also possible to infer which type of expenditures are included as collective heating or trash collection. Lastly, a second important information is the type of lease indicating whether furnitures are included in the lease or not. This variable is of particular importance as the minimal length of the lease is 1 year when the flat is furnished or 3 years if not furnished. Again, if this information appears in the code of the web page for the most recent period, it was not systematically filled in the first waves. Consequently it is also coded from regular expressions identified in the description. We extract both variables, the one created from the text and the one based on the variables provided by websites. About 24% of the flats are offered as furnished. The publication date collected can also be used to compute the time elapsed since the publication was last observed. About 50% of the ads have disappeared after 20 days.

The type of units and the number of rooms. Each website has a specific part of the webpage dedicated to the type of unit and the number of rooms. No treatment is thus required and these variables are taken directly from the collected variables.

As shown in Table 3, most of the units are flats as houses only represent about 16% of the sample. Units are of a relatively small size as their vast majority have one or two rooms while the average surface was about 56 square meter. These characteristics are typical of the French rental market which is dedicated to younger people with few children.

Table 3. Type of units, number of rooms and surface.

	Count	Mean	Std
House (%)	4370365	15.4	36.1
Rooms (%): 01	4370365	21.0	40.8
Rooms (%): 02	4370365	32.6	46.9
Rooms (%): 03	4370365	26.1	43.9
Rooms (%): 04	4370365	12.4	33.0
Rooms (%): 05	4370365	5.4	22.6
Rooms (%): 6+	4370365	2.5	15.6

Open in a new tab

Floors and other amenities. From the description, it is also possible to identify in the description the floor and amenities in the building. As shown in Table 4, the floor is recovered for 40% of the ads while 14% of the ads announce the presence of an elevator. A total of 36% have a balcony or a kitchen with some equipment. Lastly, 46% offer the possibility of parking a car.

Table 4. Floors and other amenities.

	Count	Mean	Std
Floor
Floor (%): 0.0	4370365	8.8	28.3
Floor (%): 1.0	4370365	10.8	31.0
Floor (%): 2.0	4370365	7.9	27.0
Floor (%): 3.0	4370365	4.3	20.3
Floor (%): 4.0	4370365	2.1	14.4
Floor (%): 5.0	4370365	1.1	10.2
Floor (%): 6+	4370365	1.2	10.8
Floor (%): Unknown floor	4370365	63.9	48.0
Amenities
Elevator (%)	4370365	14.5	35.2
Double glazing (%)	4370365	9.6	29.5
Kitchen with equipment (%)	4370365	35.4	47.8
Garage (%)	4370365	45.4	49.8
Garden (%)	4370365	17.3	37.8
Balcony (%)	4370365	35.4	47.8

Open in a new tab

External validation of the dataset

The coverage of the database

To assess the coverage of the dataset coming from our collection process, we consider that housing units observed are a subsample of the exhaustive rental market which is documented in the French Census [27]. We develop a simple method inspired by the adjustment on margin method [28].

From the census of the year 2016, we create many strata combining the location of the occupied private rental units (municipality) and their number of rooms. For example, the first arrondissement of Paris is composed of five strata defined by the number of rooms (1,2,3,4,5 or more). Each strata contains some observations in the census noted N^c which represents the number of rental units in the strata observed in the census of a given year. Robustness checks including also vacant units are performed and reported in S2 Fig in S1 Appendix.
In the second step we assign our scraped ad to each strata. The number of scraped ads in each strata is noted n^s
The coverage (i.e. number of ads for each unit) is simply defined as $\frac{n^{s}}{N^{c}}$

Thus we can measure the coverage of each type of goods in two dimensions: their location and the number of rooms. For example: we can determine how many flats an ad with two bedrooms in the 1st district of Paris will be represented.

We use two different subsamples of the census to create two alternate measures. First, N^c is defined using all the rental units. Second, N^c is defined using the rental units occupied for less than 5 years used to proxy the flow of rental units on the market over our period of study. Fig 2 represents the distribution (weighted by the number of units) of the coverage of our strata while Fig 3 maps it across the French territory. On average each strata had one ad per unit rented for less than five years and 0.66 ads per rental unit. In rural areas, where the number of tenants is very low. The coverage is usually very high. In these places, the ratio can be even above one because there are very few rental units observed and thus one can observe more online ads than the number of tenants observed in the census. Moreover, some places can be vacant and for rents (see S2 Fig in S1 Appendix). Nevertheless, other tiny rural places are sometimes not covered by online ads. The coverage appears to be lower in city center where the number of tenants in the private sector is usually higher. For example, within Paris the average coverage among strata is around 0.5 ads per rental units occupied for less than five years. In these city, this coverage is expected to grow over time as we only have been scraping for 2 years. In some specific cities, the ratio can be also above one when the turnover is high and the same dwellings can be posted several time per year. This can be the case in areas where a large share of tenants are students or temporary workers in the tourism sector.

Fig 2 — *Note:* Distribution of the ratio between the number of ads scraped (n^s) and the number of rental units in the Census (N^c). Each observation is a strata weighted by its number of rental units (N^c). In panel a) N^c is computed from rental dwellings occupied for less than 5 years. In panel b) N^C is computed with all rental dwellings in the census.

Fig 3 — *Note:* Average number of ads per unit in all strata of a municipality. Authors’ computations and Admin Express Commune which is under an Open Licence Etalab https://www.data.gouv.fr/fr/datasets/admin-express/.

Comparison with local rent observatories and CLAMEUR

If the type of housing unit observed and the channel used to find the flat appear fairly representative, posted rent might be different from the real one. Nevertheless several important observations lead us to believe that this bias remains limited. From a theoretical standpoint, if we model the housing market as a frictional market where a landlord and a tenant meet [29], the bargained rent is a weighted sum of the landlord’s and the tenant’s surpluses. The rent crucially depends on the relative bargaining power of the landlord/tenant. However, when the bargaining power of the tenant is close to zero the rent converges toward the posted rent when we assume a price competition among landlords [30]. Moreover, the bargaining power in the Nash bargaining process can be seen as a factor of relative impatience where the impatient party has a lower bargaining power [31]. The lack of housing supply in France, particularly in large cities, suggests that prospecting people have a relatively small bargaining power at least in the major urban areas. Moreover, for other markets, we expect that the transparency of the online platforms where landlords can observe at a reduced cost the prices and movements of their competitors offering a similar unit in the same area can also drive the posted rent close to the market rent.

For these reasons posted rents are not likely to differ too much from the actual ones signed on the contract. Therefore, we compare our data with the statistics on rents published by Local Observatories (OLL) based on surveys.

OLL publishes yearly statistics on local rents based on regular surveys for 35 urban areas [32]. These datasets provide statistics on the first quartile, the median, the third quartile and the average net rents per square meter for several sampling areas within each urban areas. These statistics are computed for some groups of units with homogeneous characteristics (House vs apartments; number of rooms; period of completion of the building). Interestingly, while the survey conducted by OLL focuses on the whole rental sector, it also publishes statistics on new leases signed for less than one year. We thus recovered these statistics on new leases to compare them with the ads published in the preceding year. This allows us to test whether there exists some discrepancies between the net rents of online ads and the net rent actually paid by tenants and measured through surveys. The correlation between the median rent computed from both datasets is 0.95, as is the correlation coefficient between average rents from both sources. Fig 4 represents this correlation between median and the distribution of rents computed from both datasets with the first and the third quartiles. We observe that the distribution is concentrated around the 45 degree lines. Unfortunately neither OLL nor Clameur provides confidence interval for their estimates. Panels a) and b) from Table 5 tests whether there is a significant difference between the average and median rents computed from ads and the statistics published by the OLL estimating the following equation with ordinary least Squares:

\begin{matrix} R e n t_{d, a, r} = α + β \times 1_{d = A d s} + X_{a, r} λ \end{matrix}

(1)

where Rent_d,a,r is the average or median rent estimated using dataset d, for dwellings located in the area, a, with r rooms. α is a constant while 1_d=Ads is a dummy indicating when the observation belongs to the dataset created thanks to online ads. β, reported in Table 5, compare the difference between the averages measured from online ads and those measured by OLL. X_a,r are fixed effects. In columns 1 and 4, no fixed effects are included. This corresponds to a simple comparison of averages. In columns 2 and 5 separate fixed effects indicates the number of rooms and the area; in such a case the comparison is performed between goods controlling for the influence of the number of rooms and the location of the good. Lastly, in columns 3 and 6 there is one fixed effect per strata (number of rooms x area), which insures that the comparison is made within each strata. Correlation of the residuals within each dataset and spatial correlation is accounted for clustering the standard errors at the urban unit level and at the dataset level. On average, the median rent is 0.0134 (95% CI -0.30,0.33) cents per square meter higher when computed with online Ads while the average rent is 0.324 (95% CI -0.12,0.78) cents higher. One should note that this difference remains small (about 2% of the average rent in the sample) and is not statistically significant. The fact that the average computed from online ads is slightly higher might reflect that some limited negotiation can happen on the market but also that part of the online activity covers dwellings with higher rents when compared with dwellings found by word of mouth.

Fig 4 — *Note:* The correlation coefficient is 0.95, When estimating *Rent*_ads = *βRent*_OLL with OLS, the estimate of β is 1.029 (95% CI 0.92 1.13) with a clustered standard error of 0.038.

Table 5. Comparison between the statistics provided by online ads and the OLL or Clameur.

	(1)	(2)	(3)	(4)	(5)	(6)
	Median Rent			Average Rent
Panel A) OLL—Appartments
Ads	0.0134	0.0134	0.0134	0.324	0.324	0.324
	(0.0967)	(0.0998)	(0.0967)	(0.142)	(0.151)	(0.142)
N	640	640	640	640
R2	0.000	0.857	0.967	0.003	0.863	0.975
Panel B) OLL—All types of dwellings
Ads	0.0430	0.0430	0.0430	0.376	0.376	0.376
	(0.0754)	(0.0840)	(0.0754)	(0.153)	(0.162)	(0.153)
N	738	738	738	738	738	738
R2	0.000	0.843	0.973	0.003	0.849	0.980
Panel C) Clameur—All types of dwellings
Ads	-	-	-	0.471	0.471	0.471
	-	-	-	(1.125)	(0.179)	(0.194)
N	-	-	-	7370	7370	7370
R2	-	-	-	0.002	0.931	0.966
Controls
N. rooms	N	Y	N	N	Y	N
Area FE	N	Y	N	N	Y	N
Area FE x N. rooms	N	N	Y	N	N	Y

Open in a new tab

Standard errors in parentheses clustered at the Agglomeration levels and at the dataset level

* p < 0.05,

** p < 0.01,

*** p < 0.001

Note: Estimates of Rent_{d,a, r} = α+ β × 1_{d = Ads}+ X_a,rλ. N. Rooms corresponds to the inclusion of rooms fixed effect (1,2,3,4+ in panel A and B or 1,2,3,4,5 in panel C). Area FE corresponds to the inclusion of area fixed effects. Sampling areas are defined by OLL and mostly cover groups of municipalities in panel A and B while these areas are 887 municipalities in panel C. Area FE x N. rooms corresponds to the inclusion of interaction terms between Area FE and N. rooms.

CLAMEUR also publishes similar statistics on average rents by number of rooms for 887 municipalities. We thus also compare their statistics with these computed from online ads for similar dwellings. The correlation between the observations in Clameur and those produced by online ads is 0.94. Panel C) in Table 5 performs the same kind of exercise as the one realized for OLL. The results are comparable with these reported in panel A) and B). There is a small positive bias which is not statistically significant.

Notably, the absence of bias appeared true in tight housing markets with an inelastic housing supply but also in other market where competition also pushes landlords to reveal their reservation price. We illustrate this point in S1 Table in S1 Appendix, where the magnitude of the bias is similar in segments of the housing market above and below the median rent level. Our dataset thus appears consistent with alternate methods while providing a broader coverage of the territory with a moderate collection cost.

Estimating hedonic models for local French housing markets

This new dataset on rents allows us to build hedonic models of the French local rental markets. These models can then be used to build a spatial constant quality hedonic rental price index for French municipalities to compare the cost of housing between territories. These can also be used to predict the rental value of dwellings from alternate data source. We pursue two complementary applications with these hedonic models. First, we combine rental indices with equivalent indices based on prices to compute an hedonic rent-price ratio to document the discrepancies between rents and prices across the French territory. Second, we also use these models to predict the rental value of dwellings sold in 2016 and 2017 and the market value of subsidized dwellings belonging to the social sector where rents are administratively set and almost do not vary between municipalities. The latter allows us to estimate the in-kind benefit received by tenants living in the social housing sector and to compute the average subsidy received by social tenants for each French municipality.

Hedonic models for rents and prices

We estimated an hedonic index of the rent and price for each French municipality where we had more than 10 observations:

\begin{matrix} l n (c_{i, s}) = l n (c_{m (i)}^{r e f}) + X_{i} β_{s} + u_{i} \end{matrix}

(2)

where ln(c_i,s) is the rent or price per square meter of unit i in strata s, $l n (c_{m (i)}^{d r e f})$ is the hedonic index of the municipality m where the unit is located, X_i is a vector of hedonic characteristics of the unit common to both datasets (surface, number of rooms, presence of other amenities (furnished, inclusion of extra expenditures)) and β_s is the vector of corrective coefficients, which are allowed to vary between strata. We estimated our hedonic models separately for each year and each department for houses, apartments and a pooling of both. The specification and the use of logarithm as a dependent variable is standard in the literature on hedonic models and hedonic indices [4, 33, 34].

We estimate the rent index with our dataset removing also outliers below the 5th and above the 95th percentiles of rent per square meter following the standards in the literature [34]. Prices come from the administrative dataset of property transactions (Demande de Valeurs Foncières—DVF) which contains all real estate transactions in France (besides the Alsace-Moselle Region). It is available from 2014. This dataset has some advantages and drawbacks when compared with the datasets of French notaries [4]. Its main advantage is its exhaustiveness; thus, it represents all transactions that took place over the period that are representative from all French dwellings than what is observed on the rental market [3]. However, this comes at some cost: the number of hedonic characteristics is rather limited as it mainly contains the type of unit (apartment vs. House), the number of rooms, the surface, and the precise geocoding of the parcel. In estimating the rental value of social dwellings, the hedonic model as more variables as the social housing database (Répertoire des logements locatifs des bailleurs sociaux—RPLS) contrains more characteristics common with our dataset as the energy efficiency of dwellings and the floor. We also include neighborhoods fixed effects instead of municipality fixed effects when we have enough observations at the neighborhood level. This allows avoid relying on to contextual variables.

The rent-price ratio in French municipalities

We compute the Rent Price Ratio for French municipalities using the following equation:

\begin{matrix} \frac{R}{P} = \frac{e^{(l n (r_{m (i)}^{r e f}) + \frac{1}{2} σ_{r, s}^{2})}}{e^{(l n (p_{m (i)}^{r e f}) + \frac{1}{2} σ_{p, s}^{2})}} \end{matrix}

(3)

where $l n (r_{m (i)}^{r e f})$ and $l n (p_{m (i)}^{r e f})$ are respectively municipality levels hedonic indices for prices and rents, respectively, while σ is the root-mean-square error of the corresponding regression used to remove the bias arising from the reverse transformation from logarithm [35]. In Fig 5, we plot the constant quality hedonic rent-price ratio index against the municipal hedonic price index. We observe that even when controlling for the characteristics of the dwellings, there remained some large spatial heterogeneity in the rent-price ratio, which is strongly correlated with the price levels of these municipalities. From this chart, we might infer that rents are not increasing as much as housing prices in tight markets. Panel A in Table 6 provides additional descriptive statistics on local rent, price, and rent-price hedonic indices for French municipalities. One also observes that the rent-price ratio is lower in municipalities belonging to large urban units or to the urban unit of Paris and for large municipalities with a population above 100,000 inhabitants. Similar discrepancies in the rent-price ratio have been put forward in other countries as in the US where superstar cities have a low rent-price ratio [36] or the UK [37]. The literature suggests that this might be explained by a higher expected growth rate of rents and the lower housing supply elasticity in the most expansive cities. Documenting, the main drivers of the spatial discrepancies is beyond the scope of this paper.

Fig 5 — *Note:* Price data are taken from the fiscal administrative database *Demande de Valeur Foncière*, rent data are taken from the database constructed by the authors. The rent-price ratio is calculated as the ratio of the municipal hedonic price index and the hedonic rental index. Only municipalities with more than 10 observations are displayed.

Table 6. Descriptive statistics of the municipal hedonic indices.

	Monthly Rent per square meter	Yearly Price per square meter	Rent-Price Ratio yearly in %	Observations municipalities
Country level	8.41	1748.00	6.14	22920
Outside urban unit	7.88	1597.00	6.29	16413
Urban unit with 2k to 4k inhabitants	8.67	1846.76	5.90	1642
Urban unit with 5k to 10k inhabitants	8.80	1892.00	5.83	1007
Urban unit with 10k to 20k inhabitants	9.05	1947.97	5.81	705
Urban unit with 20k to 50k inhabitants	9.00	1944.43	5.76	725
Urban unit with 50k to 100k inhabitants	9.24	1948.03	5.88	498
Urban unit with 100k to 200k inhabitants	10.37	2220.27	5.89	364
Urban unit with 200k to 2,000k inhabitants	10.56	2356.41	5.55	1137
Urban unit of Paris	16.41	3903.05	5.24	429

Open in a new tab

Note: The first panel provides estimates of rent and prices for individual dwellings of the Demande de Valeur Foncière administrative database. The second and third panels provide rent estimates and hedonic prices at the municipal level. Rents in column 3 and individual prices in column 4 are the hedonic prices and rents estimated with dwellings individual characteristics. Rent-price ratios are the ratio of the estimated prices and rents of columns 3 and 4. Municipal rents and prices were estimated from the fixed effects of hedonic models.

This strong spatial variation in the rent-price ratio demonstrates that the choice to measure the cost of housing due to prices or rents might affect the estimated elasticities between amenities and housing costs. For example, in a companion paper [38], we show that the cost of agglomeration—that is, the elasticity between density and housing cost in the city center [4]—lower when measured with rents than when measured with prices.

Predicting the rent of real estate transactions and social dwellings

After the estimation of Eq 2, it is also possible to predict a rental value, a price, and thus a rent-price ratio for every transactions that we observe in DVF. The rent per square meter is simply obtained by substituting dwellings’ characteristics in the estimated hedonic equation:

\begin{matrix} \hat{R} = e^{\hat{l n (r_{m (i)}^{r e f})} + X_{i} \hat{β_{s}} + \frac{1}{2} {\hat{σ}}_{p, s}^{2}} \end{matrix}

(4)

The descriptive statistics of this output are reported in Table 7. There are discrepancies in the rent-price ratio following the size of the dwellings. On average, the estimated rents of dwellings for sale in 2017 were 11,66 euros per square meter and were lower for houses than for apartments. The rent-price ratios also present clear discrepancies following the size of the dwellings. The average predicted rent-price ratio is 5.69 percent and ranges from 6.93% for small dwellings with one single room to 5.26% for large units with five rooms or more. These discrepancies remain even when investigating the patterns within municipalities as illustrated in S4 Table in S1 Appendix.

Table 7. Descriptive statistics on the transaction-level rent-price ratio.

	Monthly Rent per square meter	Yearly Price per square meter	Rent-Price Ratio yearly in %	Observations dwellings
All	11.66	2641.79	5.69	714437
Rooms: 1	18.85	3715.98	6.93	57434
Rooms: 2	14.90	3288.53	6.11	113932
Rooms: 3	11.73	2685.55	5.74	172100
Rooms: 4	10.01	2349.93	5.43	190975
Rooms: 5+	9.01	2157.48	5.26	179996
Type: apartment	15.14	3379.90	5.95	312935
Type: house	8.95	2066.50	5.49	401502

Open in a new tab

Eq 2 can also be used to predict the rental values of social dwellings. Social housing units often called “Habitations à Loyers Modérés” (HLM) represent about half of the rental market and thus between 14% ad 20% of the total housing stock in France. They are mostly owned and managed by non-profit and public landlords. Contrary to the private sector, their rent is controlled and set administratively, as a rent ceiling per square meter is imposed to social landlords. Moreover, households under an income ceiling have to be registered on a waiting list to access these units which are assigned by a commission. One interesting feature of the sector comes from the fact that all rents have been systematically registered in a centralized file, the RPLS, since 2011. To account for the spatial concentration of social dwellings, we substitute municipal fixed effects with neighborhood (sections cadastrales) fixed effects. It is worth noting that these neighborhood fixed effects allow us to avoid relying on contextual variables (average income, socioeconomic composition of the neighborhood, etc.).

The results are reported in Table 8. The implicit subsidy is obtained substrating the rent paid by social tenants from the estimated market rent. In relative terms, the implicit subsidy represents about 46.6% of the rental value of the unit. In absolute terms, the average subsidy is between 370 and 390 euros while the median was around 300 euros. The average subsidy is 110 euros larger than in a previous study [39]; three reasons might be invoked to explain such a result. First, rents have been increasing since 2006, and thus the subsidy should be at least around 320 euros by the sole effect of inflation of both rents, a level close to our median estimates. Second, previous estimates were based on a former leases (in stock), while we estimate on new leases. Third, and more importantly, previous estimates were based on a survey that was probably not representative of the whole distribution of rents in the social housing sector. This is a concern for Paris, which represents about 5% of social dwellings. The implicit subsidy for the French capital appears extremely high for a large number of units and increases dramatically the average. Indeed, the right tails of the distribution of the subsidy were exclusively composed of housing units located in Paris, where the subsidy can be well above 1,000 euros. Typically, a social housing unit in the center of Paris with a surface above 80 square meter and a controlled rent below 600 euros while its market rent is estimated to be above 2,500 euros. The absence of these units in the French housing survey might have biased the average subsidy in previous contributions. This is easily perceptible by the large discrepancy (80 euros) between the average and the median subsidies in our results. By excluding Paris from our dataset, the average subsidy declined to 335 euros. Lastly, by excluding Paris urban area, the average subsidy dropped to 255 euros. However, it is worth noting that the RPLS does not contains any information about the complementary rents (supplément de loyer solidarité) that the wealthiest households should pay when occupying some social housing units in desirable areas. This might decrease rent savings for some desirable units, even if many administrative reports estimate that these additional rents are only mildly applied. In a nutshell, our estimates are in line with previous studies, while gaining access to the whole distribution of social housing units shows that the implicit subsidy might be more unevenly distributed.

Table 8. Descriptive statistics on the implicit subsidy of social housing.

	Total			Per square meter			N
	Market Rent	Subsidized Rent	Subsisdy	Market Rent	Subsidized Rent	Subsisdy	N
Panel A) Characteristics of the dwellings
All dwellings	771.52	386.93	390.04	12.10	5.96	6.19	4786527
1	536.43	247.75	297.68	17.76	8.03	9.72	254433
2	646.60	316.63	331.75	13.57	6.56	7.03	942511
3	752.51	378.08	378.41	11.79	5.85	5.99	1793861
4	854.29	435.61	426.78	10.91	5.51	5.49	1397673
5+	1012.71	511.26	511.82	10.58	5.31	5.39	398049
Panel B) Type of subsidy received
PLAI	730.90	360.97	373.98	11.99	5.69	6.25	257518
PLI	985.37	575.49	440.64	15.60	8.82	7.22	145522
PLS	911.03	544.31	376.96	14.80	8.61	6.13	251870
PLUS	758.02	372.99	390.21	11.82	5.72	6.16	4131617
Panel C) Area
01	1028.85	420.11	611.69	16.41	6.66	9.72	840792
01 bis	1407.07	453.83	959.54	24.32	7.82	16.46	409041
02	691.22	382.28	312.07	10.60	5.80	4.81	2063051
03	560.71	355.06	208.40	8.35	5.23	3.15	1473643

Open in a new tab

Note: The table presents the observed and estimated rents for the RPLS. Column (1) shows the estimated average rent for the private market, column (2) the observed average rent in the social stock, column (3) the average social benefit. Columns (4) to (6) present the same variables at the square meter level.

If there exists different categories of social dwellings adjusting the maximum rent following household’s income and the location of the apartment, these parameters play a very limited role in explaining the magnitude of the implicit subsidy, which is almost exclusively driven by the location of the dwelling as illustrated in Fig 6. For example, while the average private rent per square meter in the center of Paris is 35 euros, the rent cap remains around 6.09 euros for the most common type of social dwellings. In contrast, private rents can be less than half in the Paris suburbs, while the rent ceiling falls to 4.7 in the social housing sector. In other words, the rent gradient of the urban area for social dwellings is almost flat while it steeply declines in the private sector, resulting in a very large relative implicit subsidy in the center of Paris [40]. The importance of location in the implicit subsidy is also illustrated in Table 8 by the fact that PLI, which are social dwellings designed for the upper-middle class, offer a higher amount of implicit subsidy than standard social dwellings because they tend to be concentrated in more attractive municipalities.

Fig 6 — *Note:* This figure provides relationship between the estimated average rent in the private sector per municipality (x-axis) and the implicit housing subsidy per municipality (y-axis). Characteristics of the dwellings are those of the RPLS, private rents are estimated from our data.

S5 Table in S1 Appendix confirms this point in a more systematic way: regressing the estimated subsidy on dwelling’s characteristics only explains a limited share of the variance while including geographical fixed effects as social housing areas or municipality fixed effects increase dramatically the R2. S6 Table in S1 Appendix also shows that, net of dwellings’ characteristics, the city level of implicit subsidy is mostly explained by the level of private rent.

This uneven spatial distribution of the implicit subsidy has a direct consequence on the redistributive profile of the policy. Indeed, as richer households tend to remain for a longer time in good-quality social dwellings thanks to the right of security of tenure [41], they tend to be concentrated in social dwellings located in wealthier municipalities thus benefiting from the largest in kind subsidy as illustrated in Fig 7. Columns 2 and 3 in S6 Table in S1 Appendix show that this correlation holds controlling for the share of vacant dwellings and the size of the municipality. It is noteworthy that the inclusion of urban area fixed effect in column 4 reduces this correlation indicating that richer social tenants tend to be located in larger urban units as Paris or Lyon. As a consequence, the correlation between income and subsidy is smaller within agglomerations. Overall, our results are in line with previous findings in Finland [42] where it has been shown that social housing are less redistributive than housing allowances as they are harder to concentrate on poorest households.

Fig 7 — *Note:* This figure displays the relationship between the average per capita income of social housing tenants per municipality in 2013 (x-axis) and the average level of implicit subsidy of social housing per municipality (y-axis). Municipalities have been grouped by quantiles of per capita income of social housing tenants. Characteristics of the dwellings are those of the RPLS, private rents are estimated from our data.

Conclusion

In this paper, we describe a new data collection technique to provide accurate data on local housing markets for researchers and statisticians using online data. As we attempt to demonstrate, this can provide a relatively cheap and precise way to collect an important amount of micro data to answer research questions related to market dynamics. If these online data correspond to posted rents and not to signed contracts, the relative transparency of online platforms may force landlords to reveal the market price. The comparison between our dataset and standard surveys supports this intuition. Indeed, no significant difference is observed between the average rent computed from online ads and from local rental observatories, which strengthens our confidence in the reliability of research papers based on these types of data.

We also present systematic evidence of the divergence between measures of housing costs based on prices or rents documenting the spatial variation of the rent-price ratio in France. This stresses the fact that the capitalization of amenities might be different according to the type of data used.

Finally, we also use these data to calibrate hedonic models in order to predict the rental value of transactions and social dwellings, which we provide as an additional output of our paper. The estimated rent for social dwellings allows us to document the spatial disparities in the in-kind subsidy received by social tenants in France, which tends to be positively correlated with the income of these households, confirming more systematically previous findings based on local [39, 41] or international [42] studies.

To conclude, we posit that these data offer many potential applications and should allow to tackle new research questions. For example, these data can be used to assess the rental value of the housing stock for research or taxation purposes [3], for public policy evaluation [18], or to assess the impact of the development of Airbnb on rental markets [19]. It can also be used to test spatial equilibrium models [43]. Finally, such a technique offers possibilities in countries where the collection of data is difficult and costly, as in developing economies [44].

Supporting information

S1 Appendix

(PDF)

Click here for additional data file.^{(630.9KB, pdf)}

S1 File

(ZIP)

Click here for additional data file.^{(22MB, zip)}

Acknowledgments

The authors also thank participants in the Large Open/Online Raw Dataset (LOORD) and Numimmo seminars for their comments and questions. They also thank the editor Nils Kok and two anonymous referees for their helpful comments. They are particularly grateful to Antonin Bergeaud, Jean-Charles Bricongne, Julia Cagé, Gabrielle Fack, Gilles Duranton, Laurent Gobillon, Morgane Laouennan, Philippe Martin, Joan Monras, Florian Oswald, Bruno Palier, Quentin Ramond, Marco Schmid, Claude Taffin, Corentin Trévien, Grégory Verdugo, Paul Vertier, Benjamin Vignolles and Etienne Wasmer for their helpful comments and discussions.

Data Availability

ALL the reproduction programs are available in the Open Science Framework Repository (DOI: 10.17605/OSF.IO/CW37U) The repository also contains aggregate data on local rental market necessary to reproduce the tests for the bias (Table 5). Underlying micro data can be accessed by submitting a request to the LIEPP, Sciences Po (liepp@sciencespo.fr). Enquête Logement data underlying Table 1 can be accessed submitting a request on http://quetelet.progedo.fr/. DVF+ files are available from the OSF repository and can also be accessed from https://datafoncier.cerema.fr/donnees/autres-donnees-foncieres/dvfplus-open-data. Access to RPLS data including information on rents is restricted. Researchers from public institutions can access the version used submitting a request to rpls.cgdd@developpement-durable.gouv.fr.

Funding Statement

The authors acknowledge the support from ANR-11-LABX-0091 (LIEPP) and ANR-11-IDEX-0005-02 awarded to GC and JBE. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Kelly J, Le Blanc J, Lydon R. Pockets of risk in European housing markets: then and now. Frankfurt a. M.: European Central Bank; 2019. 2277. [Google Scholar]
2. Vignolles B. Three empirical essays on spatialized housing policies. EHESS. Paris; 2019. [Google Scholar]
3. Chapelle G, Fabre B, Lallemand C. Révision des valeurs locatives surles locaux d’habitation: une évaluation sur grandes agglomérations. Paris: Institut des Politiques Publiques; 2020. 28. [Google Scholar]
4. Combes PP, Duranton G, Gobillon L. The costs of agglomeration: House and land prices in French cities. The Review of Economic Studies. 2019;86(4):1556–1589. doi: 10.1093/restud/rdy063 [DOI] [Google Scholar]
5.Le Bon Coin;. Available from: https://www.leboncoin.fr/.
6.Se Loger;. Available from: https://www.seloger.com/.
7.Yanport. Classement des portails immobiliers; 2020. Available from: https://www.yanport.com/blog/posts/classement-des-portails-immobiliers.
8.INSEE. Enquête Logement;. Available from: http://www.insee.fr/fr/methodes/default.asp?page=definitions/enquetelogement.htm.
9.INSEE. Enquête Loyers et charges;. Available from: http://www.insee.fr/fr/methodes/default.asp?page=sources/opeenq-loyers-et-charges.htm.
10.Observatoire des Loyers de l’Agglomération Parisienne. L’observatoire des loyers de l’agglomération parisienne;. Available from: http://www.observatoire-des-loyers.fr/.
11.Observatoires des Loyers. Les observatoires locaux des loyers;. Available from: https://www.observatoires-des-loyers.org/2/accueil.htm.
12.CLAMEUR. Connaitre Les Loyers et Analyser les Marches sur les Espaces Urbains et Ruraux;. Available from: http://www.observatoire-des-loyers.fr.
13. Gregoir S, Hutin M, Maury TP, Prandi G. Measuring local individual housing returns from a large transaction database. Annals of Economics and Statistics. 2012;107/108:93–131. doi: 10.2307/23646573 [DOI] [Google Scholar]
14. Boulay G, Blanke D, Casanova Enault L, Granié A. Moving from Market Opacity to Methodological Opacity: Are Web Data Good Enough for French Property Market Monitoring? The Professional Geographer. 2020;73(1):115–130. doi: 10.1080/00330124.2020.1824678 [DOI] [Google Scholar]
15. Boeing G, Waddell P. New insights into rental housing markets across the United States: Web scraping and analyzing craigslist rental listings. Journal of Planning Education and Research. 2017;37(4):457–476. doi: 10.1177/0739456X16664789 [DOI] [Google Scholar]
16. Kholodilin KA, Mense A, Michelsen C. The market value of energy efficiency in buildings and the mode of tenure. Urban Studies. 2017;54(14):3218–3238. doi: 10.1177/0042098016669464 [DOI] [Google Scholar]
17. Hyland M, Lyons RC, Lyons S. The value of domestic building energy efficiency—evidence from Ireland. Energy Economics. 2013;40:943–952. doi: 10.1016/j.eneco.2013.07.020 14506873 [DOI] [Google Scholar]
18. Mense A, Michelsen C, Kholodilin KA. The effects of second-generation rent control on land values. In: AEA Papers and Proceedings. vol. 109; 2019. p. 385–88. doi: 10.1257/pandp.20191023 [DOI] [Google Scholar]
19. Garcia-López MÀ, Jofre-Monseny J, Martínez-Mazza R, Segú M. Do short-term rental platforms affect housing markets? Evidence from Airbnb in Barcelona. Journal of Urban Economics. 2020;forthcoming. doi: 10.1016/j.jue.2020.103278 [DOI] [Google Scholar]
20. Laouénan M, Rathelot R. Can information reduce ethnic discrimination? evidence from airbnb. American Economic Journal: Applied Economics. 2020;forthcoming. [Google Scholar]
21.Edelman BG, Luca M. Digital discrimination: The case of Airbnb. com. Harvard Business School; 2014. 14-054.
22. Brülhart M, Danton J, Parchet R, Schläpfer J. Who Bears the Burden of Local Taxes. Lausanne: HEC Lausanne; 2019. [Google Scholar]
23. Loberto M, Luciani A, Pangallo M. What do online listings tell us about the housing market? Bank of Italy; 2020. 1171. [Google Scholar]
24. Bricongne JC, Turrini A, Pontuch P. Assessing house prices: insights from Houselev, a dataset of price level estimates. Frankfurt: European Commission; 2019. 101. [Google Scholar]
25.Baietto-Beysson S, Vorms B. Les observatoires des loyers. Ministère de l’écologie, du développement durable, des transports et du logement; 2012.
26.Chappert A, Kaba-Langlois I, Friggit J, Laporte P. Rapport sur l’organisation d service statistique dans le domaine du logement. Conseil Général de l’environnement et du développement durable; 2014. 009075-02.
27.INSEE. Recensement de la population;. Available from: https://www.insee.fr/fr/statistiques/4229099?sommaire=4171558.
28.Sautory O. Calmar 2: A new version of the calmar calibration adjustment program. In: Proceedings of Statistics Canada Symposium; 2003.
29. Wheaton WC. Vacancy, search, and prices in a housing market matching model. Journal of political Economy. 1990;98(6):1270–1292. doi: 10.1086/261734 [DOI] [Google Scholar]
30. Desgranges G, Wasmer E. Appariements sur le Marché du Logement. Annales d’Economie et de Statistique. 2000;58:253–287. doi: 10.2307/20076236 [DOI] [Google Scholar]
31. Binmore K, Rubinstein A, Wolinsky A. The Nash bargaining solution in economic modelling. The RAND Journal of Economics. 1986;17(2):176–188. [Google Scholar]
32.Observatoires des Loyers. Données des observatoires locaux;. Available from: https://www.data.gouv.fr/fr/datasets/resultats-nationaux-des-observatoires-locaux-des-loyers/.
33. Musiedlak Y, Vignolles B. Les mouvements des prix immobiliers dans l ’ancien au cours des années 2000: des marchés locaux différenciés. Document de travail du CGEDD. 2016;(24):1–40. [Google Scholar]
34. Gouriéroux C, Laferrère A. Managing hedonic housing price indexes: The French experience. Journal of Housing Economics. 2009;18(3):206–213. doi: 10.1016/j.jhe.2009.07.012 [DOI] [Google Scholar]
35. Wooldridge JM. Econometric analysis of cross section and panel data. MIT press; 2010. [Google Scholar]
36. Gyourko J, Mayer C, Sinai T. Superstar cities. American Economic Journal: Economic Policy. 2013;5(4):167–99. [Google Scholar]
37.Hilber CA, Mense A. Why have house prices risen so much more than rents in superstar cities? Centre for Economic Performance, LSE; 2021.
38.Chapelle G, Eyméoud JB. Is density bad for tenants? Sciences Po, Mimeo; 2021.
39. Trevien C. Habiter en HLM: quel avantage monétaire et quel impact sur les conditions de logement? Economie et statistique. 2014;471(1):33–64. doi: 10.3406/estat.2014.10480 [DOI] [Google Scholar]
40. Chapelle G, Wasmer E, Bono PH. Spatial misallocation and rent controls. In: AEA Papers and Proceedings. vol. 109; 2019. p. 389–92. doi: 10.1257/pandp.20191024 [DOI] [Google Scholar]
41. Laferrère A. Pauperization and Polarization of French Social Housing. Revue économique. 2013;64(5):805–832. doi: 10.3917/reco.645.0805 [DOI] [Google Scholar]
42. Eerola E, Saarimaa T. Delivering affordable housing and neighborhood quality: A comparison of place-and tenant-based programs. Journal of Housing Economics. 2018;42:44–54. doi: 10.1016/j.jhe.2017.12.001 [DOI] [Google Scholar]
43. Ortega J, Verdugo G. “Moving Up or Down? Immigration and the Selection of Natives across Occupations and Locations. Bonn: Institute for the Study of Labor (IZA); 2016. 10303. [Google Scholar]
44. Belaid N, Boujamaa A, Chapelle G, Taffin C, Sayah Z. Rapport sur le logement locatif en Tunisie. Washington D.C.: The World Bank; 2017. [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0260405.r001

Decision Letter 0

Nils Kok

18 Mar 2021

PONE-D-21-03045

Can big data increase our knowledge of local rental markets? A dataset on the rental sector in France

PLOS ONE

Dear Dr. Chapelle,

Thank you for submitting your manuscript to PLOS ONE.

I have now received the reports from two expert referees, and I have your manuscript myself. One referee is of the opinion that the manuscript should be rejected, while the other referee suggests revisions that are quite doable. My own opinion is in the middle -- the current version of the paper is not interesting enough for PLOS One, but, with hard work I can see a version of this paper that is more suitable. Most importantly, the paper shouldn't just focus on the novelty of web scraping and your ability to build a database. You should actually do something with the data, e.g. build an index, provide an interesting analysis that would otherwise be impossible in France, etc. I see this as a challenging revision and there is no guarantee that, should you resubmit a revised version of the paper, the revision will be accepted. I would fully understand if you don't pursue a publication in PLOS One and send the paper to a different journal.

If you decide to move forward, please submit your revised manuscript by Apr 24 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Nils Kok

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

3. Thank you for stating the following in the Acknowledgments Section of your manuscript:

"The authors acknowledge the support from ANR-11-LABX-0091 (LIEPP) and

ANR-11-IDEX-0005-02. They also thank participants in the Large Open/Online Raw

Dataset (LOORD) and Numimmo seminars for their comments and questions. They are

particularly grateful to Jean-Charles Bricongne, Julia Cage, Gilles Duranton, Laurent

Gobillon, Morgane Laouennan, Philippe Martin, Joan Monras, Florian Oswald, Bruno

Palier, Quentin Ramond, Marco Schmid, Claude Taffin, Corentin Trevien, Gregory

Verdugo, Paul Vertier, Benjamin Vignolles and Etienne Wasmer for their helpful

comments and discussions."

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

"The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript."

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

4. Thank you for stating the following in the Competing Interests section:

"The authors have declared that no competing interests exist."

We note that one or more of the authors are employed by a commercial company: Banque de France.

(1) Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form.

Please also include the following statement within your amended Funding Statement.

“The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.”

If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement.

(2) Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc.

Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf.

Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests

5. Please ensure that you refer to Figures 1 and 4 in your text as, if accepted, production will need this reference to link the reader to the figure.

6. We note you have included a table to which you do not refer in the text of your manuscript. Please ensure that you refer to Table 2 in your text; if accepted, production will need this reference to link the reader to the Table.

7. We note that Figures 2 and 4 in your submission contain map images which may be copyrighted. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For these reasons, we cannot publish previously copyrighted maps or satellite images created using proprietary data, such as Google software (Google Maps, Street View, and Earth). For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

(1) You may seek permission from the original copyright holder of Figures 2 and 4 to publish the content specifically under the CC BY 4.0 license.

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission.

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

(2) If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

The following resources for replacing copyrighted map figures may be helpful:

USGS National Map Viewer (public domain): http://viewer.nationalmap.gov/viewer/

The Gateway to Astronaut Photography of Earth (public domain): http://eol.jsc.nasa.gov/sseop/clickmap/

Maps at the CIA (public domain): https://www.cia.gov/library/publications/the-world-factbook/index.html and https://www.cia.gov/library/publications/cia-maps-publications/index.html

NASA Earth Observatory (public domain): http://earthobservatory.nasa.gov/

Landsat: http://landsat.visibleearth.nasa.gov/

USGS EROS (Earth Resources Observatory and Science (EROS) Center) (public domain): http://eros.usgs.gov/#

Natural Earth (public domain): http://www.naturalearthdata.com/

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: For more details, please see the attachment

Overall, the authors did a great job creating this extensive dataset. How- ever, even though the presented methodology is advertised as low-cost, poten- tial obstacles and the required effort should be better discussed. The authors had to scrape the websites on a monthly base for around 2 years (l. 105) and used cooperative websites, providing public APIs (l.100). This effort might not be suitable for some purposes, such as when needing timely data or his- torical data. Furthermore, many similar websites in other countries are less cooperative, employing software to actively prevent scraping.1. However, my main problem with the study is the lack of originality in its current form. As explained below, there are commercial software solutions for web-scrapping technology. Furthermore, other studies have already explored scrapped real estate data or use them actively to answer other research questions. The authors discuss potential applications using “better”, scrapped data. Maybe some of these could be further explored to set the study apart from similar studies.

Reviewer #2: The purpose of this short paper is to present a novel database of housing rents in France. As is the case in many other countries, there exist few easily accessible databases of rental prices. The authors scrap data from two major real estate websites to obtain a large database of 4.3 million housing rents in France, covering the period from December 2015 until January 2018. The authors provide descriptive statistics and examine the representativeness of their database by comparing it to other databases.

From my reading of the paper, the data collection seems to have been executed properly and the resulting database could help us to acquire some new knowledge on the functioning of rental markets, in particular in France.

I provide my main comments and suggestions to this study below.

Major comments:

1. From the manuscript it does not become clear where the data will be published. The manuscript replicates the sample text “ALL XXX files are available in the Open Science Framework Repository (accession number(s) XXX, XXX.)”, but did not replace the XXX with the right information. At the OSF Repository, I was unable to find the database using the paper title. It is also not explicitly indicated that data will only be published at a later stage.

2. There are quite a few typos and grammatical errors in the manuscript, and I would suggest having the manuscript read and verified by a native copy editor. I am not a native speaker myself, but at the bottom of this review, I have provided a few examples from the first page. I have ignored them later in the manuscript because this would make my review lengthy.

Minor comments:

1. In the opening lines of the paper, the authors argue that historically the French authorities have recorded housing transactions and expressed limited interest in recording rental prices. I do not fully agree with this statement. For extended periods of time (at least until the mid-20th century) the French fiscal administration has been keeping enormous registrations of rental contracts, for example in the Enregistrement, since many taxes were based on rental prices rather than sales prices.

2. In the tables, the Min / 25% / 50% / 75% / Max values generally do not seem to add much, since most of the variables that are presented are dummy variables rather than continuous variables. I would suggest removing these statistics.

3. It would be useful to briefly mention the actual websites and companies that were used to collect the data and to provide some statistics on their market share / user base.

4. I would suggest adding some references to comparable work in other countries. Most notably, Boeing & Waddell (2016, Journal of Planning Education and Research) have scraped data from Craigslist in a very similar fashion.

5. For the representativeness of the database, it seems important to also consider the presence of social housing (HLM and other types), which are likely reported in the census but, based on what I assume, will not be published on these online websites.

6. The section about the comparison to the French census is at times confusing. For example, in line 190: “In a second step we assign our posted scraped to each strata”. It is not clear to me what “posted” is (it might be a typo). In line 193 it is unclear what “goods” refers to.

Some example typos from the first page:

Line 26: as the transaction is taxed instead of “when”

Line 36: “collecting information paid by …”: should “information” be “rental prices”?

Line 47: “this data sets … limits”: should be “these” and “limitations”

Line 52: “from insurance. Its provides ..”: should be “insurance companies” and “It provides”

Line 55: “on local market” should be “on local market conditions” or “on local markets”

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: ReviewPlosOne.pdf

Click here for additional data file.^{(105.7KB, pdf)}

PLoS One. 2022 Jan 27;17(1):e0260405. doi: 10.1371/journal.pone.0260405.r002

Author response to Decision Letter 0

5 May 2021

Responses in the pdf attached

Attachment

Submitted filename: Response_referee_2.pdf

Click here for additional data file.^{(53.4KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0260405.r003

Decision Letter 1

Nils Kok

22 Jul 2021

PONE-D-21-03045R1

Can big data increase our knowledge of local rental markets? A dataset on the rental sector in France

PLOS ONE

Dear Dr. Chapelle,

Thank you for submitting your manuscript to PLOS ONE. I have now received two referee reports on your paper submission. One referee recommends accepting the paper, while the second referee has gone from advising to "reject" to advising a "major revision". Looking at the comments of the referee, I'm somewhat more optimistic and would recommend a "minor revision." Much, if not all, of the feedback can be incorporated quite easily. To speed up the process, I'll likely not go back to the referee, but please provide a detailed response to each of the comments.

Please submit your revised manuscript by Sep 05 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Nils Kok

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: No

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: No

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: Summary

The authors present a new rent dataset for France, which was generated using web scrapping of online ads. Motivated by a lack of good French rental data, the growing coverage of real estate websites provides an opportunity for researchers to use data from online ads (l.31 – l.103). The authors describe how they used web scrapping over period of time to generate the underlying data and present some descriptive statistics (l.105 – l.201). Compared to “traditional” datasets of smaller size or inferior quality (less coverage or fewer information), the new dataset proofs to be a reliable source (l.202 – l.300). To demonstrate the potential of the new data, the authors estimate local hedonic indexes (l. 301 – l.340) and use these to estimate rent-to-price ratios (l.341 – l.364) and estimate free market rents for social housing properties to calculate the implied housing subsidies (l.365 – l.455).

Feedback

Overall, the authors did a great job creating this extensive dataset for France, showing its validity and using it for two potential applications. However, as explained in more detail below, I have some concerns about the overall validity of the study. The data collection process was performed in real-time (monthly) and through website APIs, meaning it cannot be done retroactively or might work in other countries. The described methodology is therefore not easily replicable but requires (extensive) local adaption on a case-by-case base, making the contribution rather descriptive.

One contribution is certainly the validation of online ads as a data source, being unbiased and in line with other datasets. However, other studies mentioned by the authors already show similar validation, even for France.

A valuable extension is certainly the application part, using the newly online data for specific use cases. However, this part requires way more attention, from a more extensive motivation (why are rent-to-price ratios at low aggregation good to have in France), over references for the utilized models, to the implications of using better data.

I would personally probably increase the application part and decrease the descriptive part.

Major Issues

• l.208: Nc needs more explanation (e.g. at what frequency is it collected). In line with the Notes of Figure 2, I understand ns is the number of online ads per strata (e.g. properties in the market) and Nc is the number of units available. Does this mean all properties (including occupied) or only in the market (and if so, over which frequency)? In the former case, a ratio of 1 would mean the online data contains as many units in the market as available or put differently, it means all apartments in the strata are on the market. Let’s assume the latter case. In this case I wonder how to interpret a ratio higher than 1? Does it mean there is more than one online add per available unit? In this case, I question the duplication filter. The authors need to be more specific.

Based on the sentence in line 225, the overall exercise seems not like a measure of representativeness but turnover, as it is suggested the ratio increases over time. I therefore understand Nc is a local constant while ns is time dependent. It would be great to have a reference point from the literature as 1 seem a very high value (meaning every apartment is on average sold once within 2 years). Overall, the whole part is just very confusing raising strong doubts about it. In Figure 3 the ratio even goes up to 40 which I cannot explain. If you have any references for this methodology, I strongly advise to use them here and be more specific about the whole test.

• The data should have been truncated using rent per square meter directly, not by price and square meter separately. As a result, there are still outliers in the data (e.g., minimum rent per square meter: 0.2 Euro in Table 2). Please investigate.

• l.323: I am confused why the authors call it an index but estimate the model for each year individually (maybe this is just a wording problem)? Also, I don’t understand how ln(c ref) is estimated. Technically, I understand that the local constant is seen as the index here, which would be in line with the literature. However, how is the logarithm applied or why is it assumed that the estimated constant is actually the logarithm. Please provide more details on the model derivation or provide some references to studies using a similar estimation. Interestingly, the authors later retransform the logarithms. Why not using levels directly then? Also, at which point is the calculation adjusted for property size or is the estimation on a per-square meter base? I think this section requires some rework and more explanations.

• There are no units in Table 6. E.g. taking Paris as an example, I don’t understand what 16.41 is. If this is in Euro, I assume it is per square meter per month, which would contradict the average in l. 427 though. This would raise the questions about the timely difference between rents and prices (rents would need to be adjusted for year or is this the monthly rent-price ratio?). In Figure 5, which is not linked to the text, it is indicated that the unit is percent.

•

Minor Issues

• The used tense is constantly changing (present, past, etc.)

• L.190 This is confusing. The authors say that they use public APIs to get the data but then state that they use HTML code to receive information. I am not aware of any API that provides HTML code. Does this mean the authors used the API and scrapped the website (double work so to say). In this case I wonder how much data could be generate by one or the other.

• Table 1: It seems like online ads are not representing the full spectrum of the rental market. Social housing seems to be excluded. Does this have implications on the estimations of free market rents for social units. Curious to hear the authors opinion.

• L.207 This is unclear and I am not sure if “crossing“ is the right word here.

• L. 221 vs. l.222 these sentences somehow contradict each other, maybe clarify.

• Please provide more information on the Strata (e.g., number of inhabitants)

• Table 3: What is a single unit or more specific what is the difference to 1 room?

• Equation (1) what is the unit of “a”? If it is Strata, I don’t understand how area fixed effects can be used (perfect collinearity)? Please elaborate a bit more and present the degrees of freedom.

• Figure 4: It also seems that the relationship is not fully 45 degrees as it diverges for higher priced areas (ca. above 20 median rent), indicating that online ads are higher for these areas (observations).

• l.362 the referenced paper is not by the same authors, so I would change the wording as readers might want to check the companion study.

• l. 470 please elaborate on this conclusion

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS One. 2022 Jan 27;17(1):e0260405. doi: 10.1371/journal.pone.0260405.r004

Author response to Decision Letter 1

31 Aug 2021

See file attached

Attachment

Submitted filename: letter_plosone__Copy_ (3)-3-5.pdf

Click here for additional data file.^{(42.5KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0260405.r005

Decision Letter 2

Nils Kok

10 Nov 2021

Can big data increase our knowledge of local rental markets? A dataset on the rental sector in France

PONE-D-21-03045R2

Dear Dr. Chapelle,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Nils Kok

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Thanks for addressing the comments of the reviewer -- I'm happy with the results and your response to the referee. At this point, the paper is ready for acceptance. Congratulations!

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0260405.r006

Acceptance letter

Nils Kok

13 Jan 2022

PONE-D-21-03045R2

Can big data increase our knowledge of local rental markets? a dataset on the rental sector in France

Dear Dr. Chapelle:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Nils Kok

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Appendix

(PDF)

Click here for additional data file.^{(630.9KB, pdf)}

S1 File

(ZIP)

Click here for additional data file.^{(22MB, zip)}

Attachment

Submitted filename: ReviewPlosOne.pdf

Click here for additional data file.^{(105.7KB, pdf)}

Attachment

Submitted filename: Response_referee_2.pdf

Click here for additional data file.^{(53.4KB, pdf)}

Attachment

Submitted filename: letter_plosone__Copy_ (3)-3-5.pdf

Click here for additional data file.^{(42.5KB, pdf)}

Data Availability Statement

[pone.0260405.ref001] 1. Kelly J, Le Blanc J, Lydon R. Pockets of risk in European housing markets: then and now. Frankfurt a. M.: European Central Bank; 2019. 2277. [Google Scholar]

[pone.0260405.ref002] 2. Vignolles B. Three empirical essays on spatialized housing policies. EHESS. Paris; 2019. [Google Scholar]

[pone.0260405.ref003] 3. Chapelle G, Fabre B, Lallemand C. Révision des valeurs locatives surles locaux d’habitation: une évaluation sur grandes agglomérations. Paris: Institut des Politiques Publiques; 2020. 28. [Google Scholar]

[pone.0260405.ref004] 4. Combes PP, Duranton G, Gobillon L. The costs of agglomeration: House and land prices in French cities. The Review of Economic Studies. 2019;86(4):1556–1589. doi: 10.1093/restud/rdy063 [DOI] [Google Scholar]

[pone.0260405.ref005] 5.Le Bon Coin;. Available from: https://www.leboncoin.fr/.

[pone.0260405.ref006] 6.Se Loger;. Available from: https://www.seloger.com/.

[pone.0260405.ref007] 7.Yanport. Classement des portails immobiliers; 2020. Available from: https://www.yanport.com/blog/posts/classement-des-portails-immobiliers.

[pone.0260405.ref008] 8.INSEE. Enquête Logement;. Available from: http://www.insee.fr/fr/methodes/default.asp?page=definitions/enquetelogement.htm.

[pone.0260405.ref009] 9.INSEE. Enquête Loyers et charges;. Available from: http://www.insee.fr/fr/methodes/default.asp?page=sources/opeenq-loyers-et-charges.htm.

[pone.0260405.ref010] 10.Observatoire des Loyers de l’Agglomération Parisienne. L’observatoire des loyers de l’agglomération parisienne;. Available from: http://www.observatoire-des-loyers.fr/.

[pone.0260405.ref011] 11.Observatoires des Loyers. Les observatoires locaux des loyers;. Available from: https://www.observatoires-des-loyers.org/2/accueil.htm.

[pone.0260405.ref012] 12.CLAMEUR. Connaitre Les Loyers et Analyser les Marches sur les Espaces Urbains et Ruraux;. Available from: http://www.observatoire-des-loyers.fr.

[pone.0260405.ref013] 13. Gregoir S, Hutin M, Maury TP, Prandi G. Measuring local individual housing returns from a large transaction database. Annals of Economics and Statistics. 2012;107/108:93–131. doi: 10.2307/23646573 [DOI] [Google Scholar]

[pone.0260405.ref014] 14. Boulay G, Blanke D, Casanova Enault L, Granié A. Moving from Market Opacity to Methodological Opacity: Are Web Data Good Enough for French Property Market Monitoring? The Professional Geographer. 2020;73(1):115–130. doi: 10.1080/00330124.2020.1824678 [DOI] [Google Scholar]

[pone.0260405.ref015] 15. Boeing G, Waddell P. New insights into rental housing markets across the United States: Web scraping and analyzing craigslist rental listings. Journal of Planning Education and Research. 2017;37(4):457–476. doi: 10.1177/0739456X16664789 [DOI] [Google Scholar]

[pone.0260405.ref016] 16. Kholodilin KA, Mense A, Michelsen C. The market value of energy efficiency in buildings and the mode of tenure. Urban Studies. 2017;54(14):3218–3238. doi: 10.1177/0042098016669464 [DOI] [Google Scholar]

[pone.0260405.ref017] 17. Hyland M, Lyons RC, Lyons S. The value of domestic building energy efficiency—evidence from Ireland. Energy Economics. 2013;40:943–952. doi: 10.1016/j.eneco.2013.07.020 14506873 [DOI] [Google Scholar]

[pone.0260405.ref018] 18. Mense A, Michelsen C, Kholodilin KA. The effects of second-generation rent control on land values. In: AEA Papers and Proceedings. vol. 109; 2019. p. 385–88. doi: 10.1257/pandp.20191023 [DOI] [Google Scholar]

[pone.0260405.ref019] 19. Garcia-López MÀ, Jofre-Monseny J, Martínez-Mazza R, Segú M. Do short-term rental platforms affect housing markets? Evidence from Airbnb in Barcelona. Journal of Urban Economics. 2020;forthcoming. doi: 10.1016/j.jue.2020.103278 [DOI] [Google Scholar]

[pone.0260405.ref020] 20. Laouénan M, Rathelot R. Can information reduce ethnic discrimination? evidence from airbnb. American Economic Journal: Applied Economics. 2020;forthcoming. [Google Scholar]

[pone.0260405.ref021] 21.Edelman BG, Luca M. Digital discrimination: The case of Airbnb. com. Harvard Business School; 2014. 14-054.

[pone.0260405.ref022] 22. Brülhart M, Danton J, Parchet R, Schläpfer J. Who Bears the Burden of Local Taxes. Lausanne: HEC Lausanne; 2019. [Google Scholar]

[pone.0260405.ref023] 23. Loberto M, Luciani A, Pangallo M. What do online listings tell us about the housing market? Bank of Italy; 2020. 1171. [Google Scholar]

[pone.0260405.ref024] 24. Bricongne JC, Turrini A, Pontuch P. Assessing house prices: insights from Houselev, a dataset of price level estimates. Frankfurt: European Commission; 2019. 101. [Google Scholar]

[pone.0260405.ref025] 25.Baietto-Beysson S, Vorms B. Les observatoires des loyers. Ministère de l’écologie, du développement durable, des transports et du logement; 2012.

[pone.0260405.ref026] 26.Chappert A, Kaba-Langlois I, Friggit J, Laporte P. Rapport sur l’organisation d service statistique dans le domaine du logement. Conseil Général de l’environnement et du développement durable; 2014. 009075-02.

[pone.0260405.ref027] 27.INSEE. Recensement de la population;. Available from: https://www.insee.fr/fr/statistiques/4229099?sommaire=4171558.

[pone.0260405.ref028] 28.Sautory O. Calmar 2: A new version of the calmar calibration adjustment program. In: Proceedings of Statistics Canada Symposium; 2003.

[pone.0260405.ref029] 29. Wheaton WC. Vacancy, search, and prices in a housing market matching model. Journal of political Economy. 1990;98(6):1270–1292. doi: 10.1086/261734 [DOI] [Google Scholar]

[pone.0260405.ref030] 30. Desgranges G, Wasmer E. Appariements sur le Marché du Logement. Annales d’Economie et de Statistique. 2000;58:253–287. doi: 10.2307/20076236 [DOI] [Google Scholar]

[pone.0260405.ref031] 31. Binmore K, Rubinstein A, Wolinsky A. The Nash bargaining solution in economic modelling. The RAND Journal of Economics. 1986;17(2):176–188. [Google Scholar]

[pone.0260405.ref032] 32.Observatoires des Loyers. Données des observatoires locaux;. Available from: https://www.data.gouv.fr/fr/datasets/resultats-nationaux-des-observatoires-locaux-des-loyers/.

[pone.0260405.ref033] 33. Musiedlak Y, Vignolles B. Les mouvements des prix immobiliers dans l ’ancien au cours des années 2000: des marchés locaux différenciés. Document de travail du CGEDD. 2016;(24):1–40. [Google Scholar]

[pone.0260405.ref034] 34. Gouriéroux C, Laferrère A. Managing hedonic housing price indexes: The French experience. Journal of Housing Economics. 2009;18(3):206–213. doi: 10.1016/j.jhe.2009.07.012 [DOI] [Google Scholar]

[pone.0260405.ref035] 35. Wooldridge JM. Econometric analysis of cross section and panel data. MIT press; 2010. [Google Scholar]

[pone.0260405.ref036] 36. Gyourko J, Mayer C, Sinai T. Superstar cities. American Economic Journal: Economic Policy. 2013;5(4):167–99. [Google Scholar]

[pone.0260405.ref037] 37.Hilber CA, Mense A. Why have house prices risen so much more than rents in superstar cities? Centre for Economic Performance, LSE; 2021.

[pone.0260405.ref038] 38.Chapelle G, Eyméoud JB. Is density bad for tenants? Sciences Po, Mimeo; 2021.

[pone.0260405.ref039] 39. Trevien C. Habiter en HLM: quel avantage monétaire et quel impact sur les conditions de logement? Economie et statistique. 2014;471(1):33–64. doi: 10.3406/estat.2014.10480 [DOI] [Google Scholar]

[pone.0260405.ref040] 40. Chapelle G, Wasmer E, Bono PH. Spatial misallocation and rent controls. In: AEA Papers and Proceedings. vol. 109; 2019. p. 389–92. doi: 10.1257/pandp.20191024 [DOI] [Google Scholar]

[pone.0260405.ref041] 41. Laferrère A. Pauperization and Polarization of French Social Housing. Revue économique. 2013;64(5):805–832. doi: 10.3917/reco.645.0805 [DOI] [Google Scholar]

[pone.0260405.ref042] 42. Eerola E, Saarimaa T. Delivering affordable housing and neighborhood quality: A comparison of place-and tenant-based programs. Journal of Housing Economics. 2018;42:44–54. doi: 10.1016/j.jhe.2017.12.001 [DOI] [Google Scholar]

[pone.0260405.ref043] 43. Ortega J, Verdugo G. “Moving Up or Down? Immigration and the Selection of Natives across Occupations and Locations. Bonn: Institute for the Study of Labor (IZA); 2016. 10303. [Google Scholar]

[pone.0260405.ref044] 44. Belaid N, Boujamaa A, Chapelle G, Taffin C, Sayah Z. Rapport sur le logement locatif en Tunisie. Washington D.C.: The World Bank; 2017. [Google Scholar]

PERMALINK

Can big data increase our knowledge of local rental markets? A dataset on the rental sector in France

Guillaume Chapelle

Jean Benoît Eyméoud

Roles

Abstract

Introduction

Background

Lack of data on the French rental market

The growing coverage of real estate websites on the housing market

Table 1. Method used to find a flat in the rental sector (%).

The growing role of user-generated content in research

Materials and methods

Main dataset

Scraping process

Cleaning the data

Creation of the variables

Fig 1. Average gross rent in French municipalities.

Table 2. Price, surface, expenditures and type of lease.

Table 3. Type of units, number of rooms and surface.

Table 4. Floors and other amenities.

External validation of the dataset

The coverage of the database

Fig 2. Representativeness of the database.

Fig 3. Representativeness of the database through space.

Comparison with local rent observatories and CLAMEUR

Fig 4. Comparison of the rent distribution between OLL and ads, 2017.

Table 5. Comparison between the statistics provided by online ads and the OLL or Clameur.

Estimating hedonic models for local French housing markets

Hedonic models for rents and prices

The rent-price ratio in French municipalities

Fig 5. Rent-price ratio in 2017.

Table 6. Descriptive statistics of the municipal hedonic indices.

Predicting the rent of real estate transactions and social dwellings

Table 7. Descriptive statistics on the transaction-level rent-price ratio.

Table 8. Descriptive statistics on the implicit subsidy of social housing.

Fig 6. Average subsidy in the main French municipalities.

Fig 7. Average municipal social housing implicit subsidy and income per capita of social tenants.

Conclusion

Supporting information

Acknowledgments

Data Availability

Funding Statement

References

Decision Letter 0

Nils Kok

Roles

Author response to Decision Letter 0

Decision Letter 1

Nils Kok

Roles

Author response to Decision Letter 1

Decision Letter 2

Nils Kok

Roles

Acceptance letter

Nils Kok

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases