Abstract
Purpose
Middle Eastern (ME) population is rapidly growing in the US but cannot be easily identified in cancer registry or other databases for epidemiological research. The purpose of this study was to develop a list of common Middle Eastern surnames and validate it by linking with a cancer registry incidence files.
Methods
Surnames and place of birth in the Middle East were obtained from various sources. After exclusion of the non-specific entries, the final combined list included 49,610 surnames and was matched with the California Cancer Registry incidence file for 1988-2003.
Results
Overall, 1.4% of all registered cases were positively identified as ME that is similar to the proportion of ME population in California. Two third of the identified cases had known place of birth in the Middle East and of those, 70% were non-Arabs. The sensitivity of the list in detecting ME birth in men and women are 91% and 86%, respectively. The positive predictive values for men and women are 72% and 65%. The specificity and negative predictive values are universally over 99 percent.
Conclusion
The high accuracy reported for this Middle Eastern surname list (MESL) makes it a valuable tool for epidemiological studies of this ethnic population.
Keywords: Middle East, Name List, Epidemiology, Ethnic Identification, Ethnic studies
Introduction
Immigration from the Middle East to the United Stated (US) is rapidly increasing and is expected to grow to 2.5 million by 2010 (1). In 2000, the Middle Eastern (ME) population in California was conservatively estimated at 400,000, with a 40 percent increase since the 1990 census (2). Although the overall size of this population can be estimated through special surveys and census samples (3), ME cases are basically “hidden” in the white race and are officially not recognized as a distinct group (4). International data suggests that cancer incidence and mortality in ME population are substantially different (5), and migrant studies in Europe (6), Australia (7), and Canada (8) indicate that their initially lower rates tend to diminish with advanced acculturation.
Due to the lack of easy case identification, limited information is available from the US. Although a large proportion of the ME population in the US are recent immigrants and could be identified by their place of birth, collection of this information in cancer registration is not uniform across the US and has declined from about 60 percent in 1973 to about 10 percent in 1997 (9). In other databases where this information is routinely collected, it is generally grouped into a single category with many other places of birth and of limited epidemiological utility (10). Since ME names are substantially different from names of other ethnicities, recognition by name can be a plausible way for their identification in large databases.
Ethnic identification by surname is widely practiced for Hispanics (11,12), and is suggested for subgroups of Asian-Americans (13) and the Hmong (14) in the US, Chinese (15) in Canada, and South Asians in the United Kingdom (16,17). Name lists for identification of ME cases in the US is currently limited to a list of Arab surnames developed in Michigan for the study of cancer incidence in Arab-Americans (18), and an algorithm for identification of women with Arabic names in California (19). A combination of surnames and given names have also used for identification of Iranian immigrant in Canada (20).
Both statistical manipulations and expert review are used for development of ethnic specific name lists. The 1980 census list of Spanish surnames is based on concordance between the geographical distribution of surnames and Hispanic population in the 1980 census. Application of this name list dichotomizes the surname into Hispanic or non-Hispanic categories. In 1996, a new scheme was developed that provides a range of probability values for a particular surname to be Hispanic (21). Other approaches include a combination of surname and place of birth (12), place of birth alone (22), and expert review of public sources like phone books and mailing lists (18,8).
Census files that include name, place of birth, and ancestry are excellent sources that have successfully been used for developing surname lists including one for Asians and the Pacific Islanders (23), but these files are generally not available to researchers outside the Census Bureau. Another plausible source is the US Social Security Administration (SSA) that maintains few administrative databases including the social security number (SSN) identification database (NUMIDENT) that can be used for this purpose. This database is a repository of all application for the social security card and contains records of 400 million social security card holder who are living or have ever lived in the US and applied for the SSN since the inception of the social security program in 1936. NUMIDENT began capturing the place of birth in 1979 and does not collect information on ethnicity or ancestry (13).
The main objective of this study was to develop a Middle Eastern surnames list (MESL) from various sources, including a NUMIDENT extract. The second objective was to evaluate the accuracy of the developed file by linking it with the incidence file of the California Cancer Registry (CCR). The institutional review board (IRB) at the Public Health Institute reviewed and approved this study.
Materials and Methods
Materials
The following sources were used:
1) Middle Eastern Surnames (MES)
This file is an extract from NUMIDENT that was limited to surnames associated with birth in any of the following countries that were collectively identified to represent the Middle East for this study: Afghanistan, Algeria, Armenia, Egypt, Iraq, Iran, Jordan, Kuwait, Lebanon, Libya, Morocco, Pakistan, Saudi Arabia, Sudan, Syria, Turkey, Tunisia, and Yemen. For each surname in this file, the total number of occurrence by each of the selected countries of birth, all other countries combined, and the US were provided. SSA imposed the following limitations on this extract: a) To increase the specificity of the surnames, all surnames with less than 20 percent association with birth in the Middle East were excluded; b) In accordance with privacy guidelines, surnames with the frequency of less than 5 were not included; c) Surnames were not modified to exclude blank spaces and other non-alphabetic characters, and were truncated at 10 digits. This file had 47,574 surnames representing 2,921,704 individuals in the US, of whom 1,476,092 were born in the selected ME countries.
2) Enhanced California Death Certificate Master File (EDCMF)
This file is a copy of the 1989-1999 California death certificate master file with detailed information on place of birth of the decedents. This file includes 2,262,862 records and its particulars are previously reported (24).
3) Arab Surname List
This file is an extract from the NUMIDENT that includes surnames associated with birth in any of the following Arab countries: Algeria, Bahrain, Djibouti, Egypt, Iraq, Jordan, Kuwait, Lebanon, Libya, Mauritania, Morocco, Oman, Gaza and the West Bank, Qatar, Saudi Arabia, Somalia, Sudan, Syria, Tunisia, United Arab Emirates, and Yemen. This file has three fields: the surname, the frequency in all selected countries combined and the total in the US. Limitations imposed on this file are similar to that imposed on MES as mentioned above. It included 24,184 surnames representing 1,766,445 individuals in the US, of whom 728,046 were foreign born.
4) Early California Cancer Registry files
Prior to 1988 two population based cancer registries were functioning in California, one in Los Angeles County with 435,961 registered cases between 1972 and 1987, and one in San Francisco Bay area with 62,138 registered cases between 1973 and 1987. These files were used for preliminary evaluation and enhancement purposes.
5) Expertly Collected Surnames
This is a file of surnames extracted from mass mailing emails, phone directories, petition drives, and similar sources that are publicly available. To develop this file, surnames were visually examined and those judged as ME were selected. This file includes 2,044 surnames.
Methods
Construction of MESL began with the 47,574 records in the MES file and passed through the following major steps.
-
A)
Exclusion of surnames that were either 2 digits in length or were compromised by having a blank space and being truncated at 10 digits. Two digit surnames are rare in the Middle East, and compromised surnames are not useful. At this step, 3,845 surnames were deleted and the remaining ones were compressed to take the blank spaces and extra characters out.
-
B)
Close examination of the EDCMF revealed that 74 percent of all records that were associated with birth in the Middle East could be identified with surnames with less than 40 percent birth outside the Middle East. Based on this information, the MES list was limited to surnames with 60 percent or more birth in ME. Additionally, surnames associated with 60 percent or more birth in Pakistan and Sudan were excluded and surnames with 60 percent or more birth in the Middle East that were identified in the EDCMF but were not present in MES were added. The net result of this step was deletion of 13,739 records from the MES list.
-
C)
Next, this modified list was matched with the cancer registry files in Los Angeles and San Francisco Bay area for cases registered prior to 1988. The staff in each registry completed the linkages for this step. Results were used to further enhance MES file. In this step, 6,012 surnames were added.
-
D)
Next, 12,758 surnames from Arab surname list, and 850 surnames from the expertly collected surname list were added to create the final MESL. To evaluate the accuracy of MESL, it was linked in a deterministic manner with the first 10 letters of the compressed surnames of cases registered with the CCR incidence file for 1988 through 2003. To reduce the impact of interethnic marriages, the surnames of 221,255 (20%) women were replaced with their maiden names when it was available and different.
Results
The final MESL has 49,610 unique surnames of which 8,037 have 10 digits. Among surnames with 10 digits, 2,525 are considered complete and the remaining are truncations of longer surnames.
Table 1 presents the distribution of surnames that are identified as ME through linkage of the MESL with the CCR incidence file by place of birth and Arab ethnicity. Overall, close to 1.4 percent of all registered cases and 88.6 percent for cases born in the Middle East are positively identified by MESL. The proportion of positive cases varies by country of birth from over 90 percent for individuals born in Afghanistan, Armenia, Iran, Jordan, Lebanon, and Syria, to less than 50 percent for those born in Tunisia. Close examination of cases with birth in ME that were not identified by MESL revealed that they are mostly individuals who had surnames with European origin common among the Ashkenazi Jews in Israel and Arabs in Northern Africa. The ten most frequent surnames in this group are Abraham, Benjamin, Cohen, Dayan, Elias, George, Hanna, Levy, Simon, and Thomas. Sixty nine percent of all cases with known place of birth were born in the Middle East, 70 percent of whom were form the non-Arab countries dominated by those born in Iran (69%). This observation is in general agreement with other reports indicating that Iranians are the largest ME immigrants in California (1).
Table 1.
Distribution of Cases with the Middle Eastern Surname and place of birth, California Cancer Registry Incidence file, 1988-2003
| Place of Birth | Identified by MESL | Total | Percent
Yes |
|
|---|---|---|---|---|
| No | Yes | |||
| Middle East | 1,790 | 13,936 | 15,726 | 88.62 |
| California | 307,851 | 1,628 | 309,479 | 0.53 |
| Other US | 624,083 | 1,924 | 626,007 | 0.31 |
| Other Places | 245,988 | 2,846 | 248,834 | 1.14 |
| Unknown | 810,202 | 7,479 | 817,681 | 0.91 |
| Total | 1,989,914 | 27,813 | 2,017,727 | 1.38 |
| Non Arabs1 | 1,097 | 9,814 | 10,911 | 89.95 |
| Arabs2 | 693 | 4,122 | 4,815 | 85.61 |
| Afghanistan | 35 | 386 | 421 | 91.69 |
| Algeria | 25 | 33 | 58 | 56.90 |
| Armenia3 | 63 | 1,505 | 1,568 | 95.98 |
| Egypt | 184 | 903 | 1,087 | 83.07 |
| Iran | 355 | 6,253 | 6,608 | 94.63 |
| Iraq | 128 | 617 | 745 | 82.82 |
| Israel | 429 | 588 | 1,017 | 57.82 |
| Jordan | 39 | 401 | 440 | 91.14 |
| Lebanon | 96 | 1,057 | 1,153 | 91.67 |
| Libya | 11 | 11 | 22 | 50.00 |
| Morocco | 72 | 121 | 193 | 62.69 |
| Syria | 54 | 780 | 834 | 93.53 |
| Tunisia | 25 | 19 | 44 | 43.18 |
| Turkey | 215 | 1,082 | 1,297 | 83.42 |
| Middle East, NOS | 59 | 180 | 239 | 75.31 |
Afghanistan, Armenia, Iran, Israel, Turkey.
Algeria, Egypt, Iraq, Jordan, Lebanon, Libya, Morocco, North Africa, Syria, Tunisia
Includes Asian Republics of the former Soviet Union
A large proportion of surnames identified as Middle Eastern are structurally compliant with the naming conventions more common among Armenians. This is particularly true for individuals with known place of birth in Iran, Lebanon, Turkey, and the US (data not shown) and may represent waves of Armenian immigration to the US and California following various civil unrests in the Middle East.
The characteristics of MESL as a tool for detection of birth in the Middle East are presented in Table 2. For this analysis that is limited to 1,200,046 incident cases with known country of birth, birth in the Middle East is considered as “true classification” while identification by MESL is considered as “test classification” (25). Sensitivity of MESL is over 90 percent in men and about 86 percent in women. The positive predictive value is 72 percent for men and 65. Lower sensitivity and positive predictive values for women may be due to their use of married surnames following cross cultural marriages. Specificity and negative predictive values are universally over 99 percent.
Table 2.
Screening Characteristics of MESL for Cases with Known Place of Birth, California 1988-2003
| Birth in the Middle East | Yesa | Noa | ||
|---|---|---|---|---|
| Yes | No | Yes | No | |
| Male | 7,557 | 2,992 | 735 | 583,030 |
| Female | 6,379 | 3,406 | 1,055 | 594,892 |
| Total | 13,936 | 6,398 | 1,790 | 1,177,922 |
| Men | Women | Total | ||
|
| ||||
| Sensitivity (%) | 91.14 | 85.81 | 88.62 | |
| Specificity (%) | 99.49 | 99.43 | 99.46 | |
| Positive Predictive Values (%) | 71.64 | 65.19 | 68.54 | |
| Negative Predictive Value (%) | 99.87 | 99.82 | 99.85 | |
Middle Eastern surname list
Table 3 presents the relative distribution of major cancers by Middle Eastern ethnicity in California. The patterns presented for the MESL identified Middle Easterners and the non-Middle Eastern, non-Hispanic White populations, although based on observed cases only, suggests major different between the two groups that is highly pronounced for some cancers such as lung and bronchus, melanoma of the skin, stomach, liver, thyroid gland, and Kaposi sarcoma.
Table 3.
Cancer incidence by Middle Eastern ethnicity California Cancer Registry 1988-2003
| Cancer Sites | Middle Eastern by Surname | Non-Hispanic White † | ||
|---|---|---|---|---|
| Number | Percent | Number | Percent | |
| All sites | 27,813 | -- | 761,868 | -- |
| Breast, Female * | 4,951 | 36.66 | 118,352 | 31.36 |
| Prostate Gland * | 4,169 | 29.14 | 97,909 | 25.47 |
| Lung and Bronchus | 2,317 | 8.33 | 125,139 | 16.43 |
| Colon and Rectum | 2,977 | 10.70 | 77,641 | 10.19 |
| Melanoma of the Skin | 549 | 1.97 | 29,071 | 3.82 |
| Cervix Uteri * | 755 | 5.59 | 15,672 | 4.15 |
| Non-Hodgkin Lymphoma | 1,169 | 4.20 | 29,111 | 3.82 |
| Urinary Bladder | 1,437 | 5.17 | 30,287 | 3.98 |
| Corpus Uteri * | 623 | 4.61 | 18,836 | 4.99 |
| Leukemia (all types) | 995 | 3.58 | 19,823 | 2.60 |
| Oral Cavity | 406 | 1.46 | 19,428 | 2.55 |
| Pancreas | 628 | 2.26 | 19,258 | 2.53 |
| Kidney and Renal Pelvis | 571 | 2.05 | 15,338 | 2.01 |
| Stomach | 796 | 2.86 | 10,851 | 1.42 |
| Liver and intrahepatic bile ducts | 385 | 1.38 | 6,636 | 0.87 |
| Ovary * | 440 | 3.26 | 13,174 | 3.49 |
| Brain and Central Nervous System | 512 | 1.84 | 12,601 | 1.65 |
| Thyroid Gland | 747 | 2.69 | 7,768 | 1.02 |
| Multiple Myeloma | 340 | 1.22 | 7,889 | 1.04 |
| Esophagus | 138 | 0.50 | 7,557 | 0.99 |
| Larynx | 263 | 0.95 | 6,948 | 0.91 |
| Kaposi Sarcoma | 131 | 0.47 | 7,021 | 0.92 |
| Testis * | 189 | 1.32 | 4,469 | 1.16 |
| Hodgkin Disease | 291 | 1.05 | 4,350 | 0.57 |
US born, not identified as Middle Eastern by MESL
Sex-specific percents
Discussion
Developing a list of common ME surnames is a challenging proposition partly because of the inherent heterogeneities of the Middle East, and partly because of the historical immigration of Armenians and Sephardic Jews to other countries. The extended territory loosely known as the Middle East includes over 20 countries with 430 million population (5), who practice four major religions: Christianity, Islam, Judaism, and Zoroastrianism; speak five main languages: Arabic, Armenian, Hebrew, Persian, Turkish; and maintain deeply rooted distinct cultural values, all of which influence the naming conventions (26).
Country and Religion
Many definitions exist for the Middle East, some of which does not include North African countries and some include Pakistan and Sudan. In this study, only North African countries were included because almost all of them are Muslim and speak Arabic. Sudan is an African country with over 50 percent of its population being classified as African, and Pakistan is grouped with South Asian countries (27). Although the Middle East can be divided into Arab and non-Arab countries, there is a significant overlap among them. Armenian names are widely shared by Turks, Iranians, Coptic and Christian Arabs. Islamic names and are shared by Arabs, Iranians, Afghanis, Pakistanis, and most other Muslims worldwide. Iranian names are similar to the Afghanis, and are common among the Parsees in India.
Transliteration of names
Except for Turkish, none of the ME languages use a Latin base alphabets and all of them including Turkish, contain letters that have no equivalent in English. The English alphabet has 26 letters, while Hebrew has 23, Arabic 28, Turkish 29, Persian 32, and Armenian 39 letters, additionally most of them do not have written vowels (28). The implication is that new combination of letters must be “invented” to correctly express the phonetic transliteration of the original ME name. Guidelines are developed for Romanization of the Arabic alphabet (29,30) but it is not clear how effective they may be. Similarly, lack of written vowels increases the number of variations of the same names and results in inflated numbers and lack of commonality. In the CCR file, the average case count for ME and non-ME surname are respectively 1.5 and 6.2. ME surnames are generally longer and 11.1 percent of then have ten or more digits compared to 1.7 percent for the non-ME surnames.
Immigration
Immigrant populations are significantly different from the general population in the home country, and in waves that have settled at different times and in different host countries. In this study, surnames used by the first generation immigrants from Turkey, Lebanon, Iran, and the Caucasian Republic of the former Soviet Union, were dominated by surnames with Armenian characteristics. Unlike Lebanon and the Caucasian Republics of the former Soviet Union, Turkey and Iran have small Armenian populations. Moreover, Turkish immigrants in Germany have surnames that are typically Turkish (31). The reason for this difference seems to be the timing of immigration. The Turkish immigrants to the US are Armenians who fled the Ottoman Empire early in 1900, while Turks who migrated to Germany following the end of the Second World War represent the Turkish Republic. Another source of difference is religion. Turkey, Iran, and Afghanistan are three countries in which Islam is dominant and is observed by 99 percent of the population (32). However, the proportions of Muslims among California cases that were born in these countries were 6 percent for Turkey, 17 percent for Iran, and 53 percent for Afghanistan. In addition, 44 percent of immigrants from Turkey are Christian, and 17 percent of Iranian immigrants are Jewish.
Conclusion
Although the Middle East is a mosaic of countries that share different religions, cultures, and languages among themselves and with other countries, it is possible to develop a surname list that can identify this ethnic group with reasonable accuracy. Middle Eastern population is rapidly growing in the US and will have substantial impact on health related issues. Due to difficulties in their identification in large databases studying them has been a challenge in the past. Results reported here suggest that MESL has high accuracy and utility for identification of this ethnic population and can be a useful tool for epidemiological studies of cancer and other disease in them.
Acknowledgments
The author would like to express his gratitude to Bert Kestenbaum from the Office of the Chief Actuary, United States Social Security Administration for providing the data and helping with this research. Appreciation is also expressed to Dr. Dee West and Kristen Unger Hu from the Northern California Cancer Center; Dr. Dennis Deapen and Peggy Balcius from the Los Angeles County Cancer Surveillance Program; and Mark Allen from the California Cancer Registry for performing various linkages and helping with this research.
The collection of cancer data used in this study was supported by the California Department of Health Services as part of the statewide cancer reporting program mandated by California Health and Safety Code Section 103885, the National Cancer Institute’s Surveillance, Epidemiology and End Results Program, and Centers for Disease Control and Prevention National Program of Cancer Registries. The ideas and opinions expressed herein are those of the author and endorsement by the State of California, Department of Health Services, the National Cancer Institute, and the Centers for Disease Control and Prevention is not intended and should not be inferred.
This work was supported by the grant CA103457 from the National Cancer Institute to Kiumarss Nasseri, and was approved by the Institutional Review Board of the Public Health Institute.
A partial report of this research was presented at the Annual Meeting of the North American Association of Central Cancer Registries (NAACCR), Regina, Canada, June 2006.
Footnotes
Conflict of Interest: None declared
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Camarota SA. Immigrants from the Middle East: A profile of the foreign-born population from Pakistan to Morocco [monograph on Internet] Washington DC: Center for Immigration Studies; Aug, 2002. [cited 2007 Aug 25]. Available from: http://www.cis.org/articles/2002/back902.html. [Google Scholar]
- 2.Lopez A. Middle Eastern populations in California: Estimates from the Census 2000 Supplementary Survey [monograph on Internet] Stanford CA: Stanford University Center for Comparative Studies in Race and Ethnicity; Jul, 2002. (CCSRE Race and Ethnicity in California: Demographics Report Series No. 10) [cited 2007 Aug 24]. Available from: http://ccsre.stanford.edu/reports/report_10.pdf. [Google Scholar]
- 3.Ruggles S, Sobek M, Alexander T, Fitch CA, Goeken R, Hall PK, et al. Integrated Public Use Micro Series: Version 3.0 [Machine-readable databases] Minneapolis (MN): Minneapolis Population Center [production and distribution]; c2004. [cited 2007, June 15]. Available from: http://usa.ipums.org/usa/ [Google Scholar]
- 4.Office of Management and Budget, Interagency Committee for the Review of Standards for Data on Race and Ethnicity, Tabulation Working Group. Provisional guidance on the implementation on the 1997 standards for Federal data on race and ethnicity. Washington, D.C.: Office of Management and Budget; Dec 15, 2000. [cited 2007 Aug 25]. [about 102p]. Available from: http://www.ofm.wa.gov/pop/race/omb.pdf. [Google Scholar]
- 5.World Health Statistics 2007. Geneva, Switzerland: World Health Organization; c2007. monograph / database on the internet. [cited 2007, Aug 10]. Available from: http://www.who.int/whosis/en/index.html. [Google Scholar]
- 6.Hemminki K, Li X, Czene K. Cancer risk in first-generation immigrants to Sweden. Int J Cancer. 2002;99:218–228. doi: 10.1002/ijc.10322. [DOI] [PubMed] [Google Scholar]
- 7.McCredie M, Coates M, Grulich A. Cancer incidence in migrants to New South Wales (Australia) from the Middle East, 1972-1991. Cancer Causes Control. 1994;5:414–421. doi: 10.1007/BF01694755. [DOI] [PubMed] [Google Scholar]
- 8.Yvari P, Hislop TG, Bajdik C, Sadjadi A, Nouraie M, Babai M, et al. Comparison of cancer incidence in Iran and Iranian immigrants to British Columbia, Canada. Asian Pac J Cancer Prev. 2006;7:86–90. [PubMed] [Google Scholar]
- 9.Clutter GG, Hall I, Gerlach K. Birthplace data: An important piece of the cancer puzzled. J Registry Manag. 2002;29:108–116. [Google Scholar]
- 10.Nasseri K. Reengineering vital registration and statistics system. Prev Chronic Dis. 2005 January;2(1):A25. Letter to the Editor. [cited 2007 Aug 25]. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid&=1323327. [PMC free article] [PubMed]
- 11.Kwong SL, Perkins CI, Morris CR, Cohen R, Allen M, Schlag R, et al. Cancer in California: 1988-1998. Sacramento, CA: California Department of Health Services, Cancer Surveillance Section; 2000. [Google Scholar]
- 12.NAACCR Latino Research Working Group. NAACCR Guidelines for Enhancing Hispanic/Latino Identification: Revised NAACCR Hispanic/Latino Identification Algorithm [NAHIA v2] Springfield IL: North American Association of Central Cancer Registries; Sep 21, 2005. [cited 2007 Aug 25]. Available from: http://www.naaccr.org/filesystem/pdf/NHIA%20v2%2009-21-05.pdf. [Google Scholar]
- 13.Lauderdale DS, Kestenbaum B. Asian American ethnic identification by surname. Population Res Policy Rev. 2000;19:283–300. [Google Scholar]
- 14.Mills PK, Yang RC, Riordan D. Cancer incidence in the Hmong in California, 1988-2000. Cancer. 2005;104:S2969–S2974. doi: 10.1002/cncr.21525. [DOI] [PubMed] [Google Scholar]
- 15.Quan H, Wang F, Schopflocher D, Norris C, Galbraith PD, Faris P, et al. Development and validation of a surname list to define Chinese ethnicity. Med Care. 2006;44:328–33. doi: 10.1097/01.mlr.0000204010.81331.a9. [DOI] [PubMed] [Google Scholar]
- 16.Cummins C, Winter H, Cheng KK, Silcocks P, Varghese C. An assessment of the Nam Pehchan computer program for the identification of names of south Asian ethnic origin. J Public Health Med. 1999;21:401–6. doi: 10.1093/pubmed/21.4.401. [DOI] [PubMed] [Google Scholar]
- 17.Nanchahal K, Mangtani P, Alston M, dos Santos Silva I. Development and validation of a computerized South Asian Names and Group Recognition Algorithm (SANGRA) for use in British health-related studies. J Public Health Med. 2001;23:278–85. doi: 10.1093/pubmed/23.4.278. [DOI] [PubMed] [Google Scholar]
- 18.Schwartz KL, Kulwicki A, Weiss LK, Fakhouri H, Sakr W, Kau G, et al. Cancer among Arab Americans in the Metropolitan Detroit Area. Ethn Dis. 2004;14:141–146. [PubMed] [Google Scholar]
- 19.Lauderdale DS. Birth outcome for Arabic-named women in California before and after September 11. Demography. 2006;43:185–201. doi: 10.1353/dem.2006.0008. [DOI] [PubMed] [Google Scholar]
- 20.Yavari P, Hislop TG, Abano Z. Methodology to identify Iranian immigrants for epidemiological studies. Asian Pac J Cancer Prev. 2005;6:455–7. [PubMed] [Google Scholar]
- 21.Word DL, Perkins RC., Jr . Building Spanish surname list for the 1990’s – A new approach to an old problem. Washington D.C.: U S Bureau of the Census, Population Division; Mar, 1996. Technical Working Paper No, 13. [Google Scholar]
- 22.Choi BCK, Hanley AJG, Holowaty EJ, Dale D. Use of surnames to identify individuals of Chinese ancestry. Am J Epidemiol. 1993;138:723–34. doi: 10.1093/oxfordjournals.aje.a116910. [DOI] [PubMed] [Google Scholar]
- 23.Falkenstein MR. The Asian and Pacific Islander surname list: As developed from Census. 2000 Available from: http://www.amstat.org/sections/srms/Proceedings/y2002/Files/jsm2002-000501.pdf.
- 24.Nasseri K. Enhancement of birthplace data in the death certificate master file: reclaiming missed data. J Registry Manag. 2005;32:32–38. [Google Scholar]
- 25.Beaglehole R, Bonia R, Kjellstrom T, editors. Basic epidemiology. Geneva: World Health Organization; 1993. p. 95. [Google Scholar]
- 26.Schimmel A. Islamic names. Edinburgh: Edinburgh University Press; 1989. [Google Scholar]
- 27.Jain RV, Mills PK, Parikh-Patel A. Cancer incidence in the south Asian population of California, 1988-2000. J Carcinogenesis. 2005;4:21. doi: 10.1186/1477-3163-4-21. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid&=16283945. [DOI] [PMC free article] [PubMed]
- 28.Onmiglot . Writing systems & languages of the world [homepage on the Internet] Brighton UK: Simon Ager; 2007. [cited 2007 Aug 25]. Available from: http://www.omniglot.com/ [Google Scholar]
- 29.Al-Bab [homepage on the Internet] Arabic words and the Roman alphabet. www Al-Bab.com [updated 2005 Aug 27; cited 2007 Aug 25]. Available from: http://www.al-bab.com/arab/language/roman1.htm.
- 30.Arabic Romanization at the Library of Congress. Washington DC: Library of Congress; cited 2007 Aug 25. n.d. Available from: http://www.loc.gov/catdir/cpso/arabic1.pdf. [Google Scholar]
- 31.Razum O, Zeeb H, Akgun S. How useful is a name-based algorithm in health research among Turkish migrants in Germany? Trop Med Int Health 2001. 2001;6:654–661. doi: 10.1046/j.1365-3156.2001.00760.x. [DOI] [PubMed] [Google Scholar]
- 32.Goring R, editor. Larousse dictionary of beliefs and religion. New York, NY: Larousse Publications; 1994. [Google Scholar]
