Methodology for the StartOut Pride Economic Impact Index
The StartOut Pride Economic Impact Index (SPEII) focuses on the impact of high-growth entrepreneurs from historically underrepresented populations. Research has shown that female, Black, and LGBTQ+ founders, among many others, experience systematic barriers to founding new companies. The SPEII measures both their existing contributions to their home cities as well as the achievement gaps caused by those barriers. For the initial launch of the Index, our population of interest is LGBTQ+ and female entrepreneurs, as well as an analysis of interregional inequality.
What is a High-Growth Entrepreneur?
A high-growth entrepreneur is anyone that starts a business with an addressable market of a billion dollars or more with the explicit intention of capturing a significant portion of that market. Local entrepreneurs creating Main Street businesses are an important contribution to the economy but are not measured here because the biggest impact on the economy in terms of net new jobs and economic growth comes from high-growth entrepreneurs. Similarly, executives at large existing corporations have a meaningful impact on the business world and people’s lives, but startup founders have a unique connection to measurable outcomes like job creation and exits. A High-Growth Entrepreneur is then defined by the company threshold below.
Threshold for High-Growth Company
For the purposes of this Index, we are interested solely in the founders of “high-growth” companies. As a matter of practical expediency, we laid out a set of minimum criteria to identify “high-growth” companies in terms of funding or economic impact. To formalize our criteria we began with two company-level data sources focused on the entrepreneurial economy: Crunchbase and Pitchbook.
These sources are the core dataset of the Index. They contain information about companies’ founders, industry, location, funding history, employment and job creation, and exits (e.g. IPOs and acquisitions) and are updated on a regular basis. The Index further augments its dataset with additional information from Wikipedia, the US Patent and Trademark Office, and the US Census Bureau. To credit patent creation to individual founders, the Index traces patent assignment through inventors to employers (companies in our dataset) and finally onto the founders of those companies.
The Index builds this set of “high-growth” companies and founders based on meeting one of the following criteria:
- received any amount of Venture Capital funding, or
- received at least $1M in Angel funding, or
- generated at least one patent and has created jobs beyond the founding team, or
- had an IPO or been acquired by another company.
These criteria allow us to be flexible in our definition of a high-growth company, allowing for organizations that have had an impact even in the absence of institutional funding. At the time of launch (as of June 30, 2020), these criteria identified 56,623 high-growth companies in the US. To date (January 13th, 2021), we have identified 62,074 high-growth companies in the US.
Entrepreneur Identification and Attribution
The purpose of the Index is to measure how individual high-growth entrepreneurs affect metro economies. It constructs this set of individuals using the dataset of high-growth companies described above. An entrepreneur or founder is any member of the founding team, i.e. anyone present at the company before its first round of funding. If a funding date is not available, founders are identified solely from the Crunchbase and Pitchbook labels.
We chose to include people present at a company before first funding, and not just founders, because we believe anyone that joined the company without a promise of funding (or salary) made a formative, high-risk contribution to the company and its impact.
Given a founding team for each company, the Index credits the company’s impact evenly among each of the founders. In other words, for a founding team of 5 individuals, each founder is given credit for ⅕ of the company’s impact (e.g. ⅕ of funding, ⅕ of jobs, ⅕ of patents, and ⅕ of exit value).
Having identified all of the high-growth entrepreneurs in its dataset, the Index applies demographic labels to each: race, gender, and sexuality. (We plan to extend these labels to include disability and immigration status.) To identify LGBTQ+ entrepreneurs for this initial launch, we have integrated three specialized datasets. The first is the membership records of StartOut, the largest LGBTQ+ entrepreneurship group in the world, and Socos’ collaborators in this initiative. The second set of records comes from a data aggregator that provides broad gender and race information to potential employers from 37 publicly available sites. StartOut engaged the aggregator to create a specialized algorithm for identifying LGBTQ individuals in its data. Finally, we have identified a number of public repositories identifying openly LGBTQ+ business leaders, such as Wikipedia.
In preliminary work, we found that certain demographic labels such as gender were fairly easily derived from our existing sources. However, as previous research has shown, many LGBTQ+ entrepreneurs are either closeted or at least not public about their identity. This is easy to understand, given that the very same research has revealed the many barriers historically marginalized entrepreneurs have experienced. In addition to the hidden status of many LGBTQ+ entrepreneurs, StartOut’s data overrepresents the cities in which it maintains active chapters: San Francisco, New York, Los Angeles, Boston, Chicago, and Austin. In order to correct for this overrepresentation and known undercounting of LGBTQ entrepreneurs, we have also developed a statistical model to estimate counts over our outcome variables of interest.
The SPEII employs a non-parametric statistical estimate of the undercount of founders. It estimates a distribution over the likely number of hidden LGBTQ+ entrepreneurs in each Metro. This distribution is derived from three variables:
Metro’s local LGBTQ+ population (from UCLA’s Williams Institute),
Metro’s entrepreneurship rate (from the Index’s aggregated data)
correction factor for LGBTQ+ entrepreneurship rates (from previous StartOut research),
where i is an index over individual Metros.
The first two variables,
give an estimated population of LGBTQ+ entrepreneurs that assumes that these two variables are independent of one another. However, we know that for most populations of interest entrepreneurship rates are meaningfully different than the general population.
So the final variable,
corrects for this dependence. The SPEII estimates this modifier from the joint rate of LGBTQ+ entrepreneurship (from the Index’s internal data) modified by state-level estimates of LGBTQ+ entrepreneurship rates reported in “The State of LGBT Entrepreneurship” whitepaper from StartOut. From this estimate we find that the rate of LGBTQ+ entrepreneurship is roughly half that of straight populations.
That gives us the Metros population of LGBTQ+ entrepreneurs
All identities of entrepreneurs in the system, LGBTQ+ or otherwise, are anonymized for publication and used for no other purpose than producing the Index. At the time of launch, these criteria identified 84,516 high-growth entrepreneurs. To date (January 13th, 2021), we have identified 124,756 high-growth entrepreneurs in the US.
The core entity of the Index is the Metro, derived from the census bureau’s definition of metropolitan statistical area. Metros represent large integrated economic regions. For example, the entirety of the San Francisco Bay Area–Oakland, San Jose, Marin County, and more–are all assigned to San Francisco Metro. The impact of individual entrepreneurs and founders on a Metro is measured in terms of the location of the headquarters of companies they have founded. This means that some prolific founders have had impacts across multiple Metros. This also means that impact measures such as jobs and patents are treated more as indices than explicit economic activity as they are credited back to the founding metro region even if the jobs actually exist in a separate city.
At the time of launch, these criteria identified 77 Metros in the United States. To date (January 13th, 2021), 95 Metros in the United States are identified.
The formal Index is a score for each metro, ranging from 0-100. By combining independent measures of innovation, job creation, and economic activity, this score reveals the impact of high-growth entrepreneurs on the Metro over a fixed period of time. The Index Score represents four sub-factors computed independently, combining explicit count and statistical models.
“Jobs” is a measure of the number of jobs created by entrepreneurs of our target population in the Metro region within the specified time period. To arrive at this number the SPEII records the total number of jobs created by each company located within the Metro. Then, as described above, it credits those jobs to the individual founders of the company. For each founder that is a member of the current target population, the Index adds their share of job creation to its aggregate jobs measure. This provides a total count of jobs created by the target entrepreneurial population.
Because of the issue of undercounting described above, our count of jobs is extended by a non-parametric estimate of the distribution of likely job creation by those undercounted entrepreneurs. To compute this distribution, we sample from the likely number of undercounted entrepreneurs from our target populations. For example, we might have estimated that there’s a 40% probability that 5 of a given Metro’s 200 funded entrepreneurs are likely LGBTQ+ and a 60% probability that 4 of them are LGBTQ+. A distribution over likely job creation is computed as follows:
- Randomly select 5 individuals out of the 200.
- Count the number of jobs that those 5 individuals created.
- Add that number to the set of possible jobs created.
- Randomly select 5 new individuals (with replacement) out of the 200.
- Count the number of jobs they created.
- Add that number to the set of possible jobs created.
- Repeat this process until a stable distribution over possible jobs emerges.
- Then randomly select 4 individuals out of the 200.
- Again, repeat this process using 4 individuals until a new distribution emerges.
- Produce a final distribution of likely job creation by weighting the original two distributions by their initial probabilities.
This gives the Index its non-parametric estimated distribution of likely job creation by LGBTQ+ entrepreneurs. If few entrepreneurs in a local population produce jobs then the bulk of this distribution will be 0 additional jobs created. If many entrepreneurs were highly productive, then the distribution accordingly is more likely to include larger job creation values. In either case, it allows us to compute a 99% confidence interval over the likely number of jobs our undercounted population generated. For the sake of Index visualization, this distribution is simplified to its mean, a single value that is added to the job creation sub-factor.
And so the final jobs sub-factor,
represents a combination of directly counted job creation and statistical estimate.
The job sub-factor is then normalized to provide a final jobs score
From the aggregate variable
we also compute a mean job score,
The aggregate job score gives the total impact of LGBTQ+ entrepreneurs on job creation within a given Metro. The mean, by comparison, gives us an idea of the individual contributions and challenges of LGBTQ+ entrepreneurs on a one-to-one basis with their straight peers.
As an initial measure of direct economic impact, the Index tracks venture funding and angel investments, as well as other forms of risk capital. The timing, amount, and nature of these investments are derived from the Pitchbook and Crunchbase sources.
The Index applies the same nonparametric estimation algorithm and Metro-based normalization that is used for jobs. This results in the following sub-component
To measure innovation, the Index relies on counts of patents created by each company. (We plan to expand to include research publications and data on media creation in the future.)
Again, the Index applies the nonparametric estimation and Metro-based normalization. This results in the sub-component
This score measures the total value of all acquisitions and IPOs of companies founded by target population entrepreneurs in a given Metro.
The nonparametric estimation and Metro-based normalization are applied, resulting in the final sub-component
The Index produces two main components: the Index Score and the Impact Size. The Score is a measure of the achievement gap for the given Metro region and will be discussed further below. The Impact Size is an absolute measure of the economic impact of our population of interest, LGBTQ+ entrepreneurs.
The Impact Size is computed by adding the four normalized sub-factors together to give a measure of the “economy” for LGBTQ+ entrepreneurs
Similarly we compute a mean impact size to reflect the average impact of individual LGBTQ+ entrepreneurs in a given metro
There is substantial existing research literature on wage gap, funding gap, and other barriers for underrepresented entrepreneurs. Here we seek to understand the scale of what individual Metros could achieve if those barriers were reduced. In the SPEII, this achievable prediction is represented by the purple bar. Rather than just imagine that LGBTQ+ or female entrepreneurs behave identically to their straight and male peers, or that cultural barriers can be immediately removed, the Index instead uses a model of “best-in-class” performance for each individual measure. The achievable represents all of the jobs or patents, e.g., that could have been. The achievement gap is the difference between the Index’s measurements and its predictions.
For each mean sub-factor, such as
the Index computes a best-in-class performance. The Index considers each Metro and removes any Metro with three or fewer members of the population of interest. Of the remaining Metros, it then removes all entrepreneurs whose performance is 2.5 standard deviations above or below that Metro’s mean for that sub-factor. The Index recomputes the mean of the sub-factor after removing those Metros and entrepreneurial outliers and also computes an outlier-corrected sub-factor mean for non-LGBTQ+ entrepreneurs.
The three best performing Metros in terms of these sub-factor means are identified. For each Metro, a ratio of LGBTQ+ to non-LGBTQ+ sub-factor means is computed and averaged
where j indexes the top 3 cities.
is the relative productivity of LGBTQ+ to non-LGBTQ+ entrepreneurs for each subfactor. It equals 1 if LGBTQ+ founded companies are as productive as non-LGBTQ+ companies for that specific factor. If it is 0.5, they perform half as well. Our model of achievement gap assumes that the local LGBTQ+ populations can meet this same best-in-class productivity ratios in their city.
That BiC ratio is important because we can’t just take the average job creation rate from the top performing city and call it a day. Different cities are different, from industries to workforce profiles to access to funding. Instead, the Index applies the BiC jobs ratio to the average job creation rate by traditional entrepreneurs in a given Metro, implicitly accounting for everything unique about that city–entrepreneurship rates, access to capital, number of universities, and more–and only adjusting for the relative performance of the target population. The Index assumes every city can achieve this BiC performance ratio.
The Index estimates the achievable for a specific Metro by applying the best-in-class ratios to the sub-factors computed for comparison entrepreneurs. For example, job creation in a manufacturing-heavy metro might be higher on average than in a FinTech focused Metro. By applying the ratio to real local productivity respects these differences. For example, an idealized mean jobs sub-factor that assumes target population entrepreneurs in every metro can achieve the same productivity ratio
is the potential job creation rate for individual local target population entrepreneurs.
The BiC model makes one additional assumption around entrepreneurship itself. As previous research has revealed, women and LGBTQ+ individuals historically become entrepreneurs at a lower rate than men and straight individuals. Just as with job creation, this rate varies wildly by Metro. The Index computes a BiC entrepreneurship ratio from the top three cities for the target population.
To compute the aggregate achievable for an entire Metro, we take the statistically inferred population described above
and multiply it by our best-in-class model for each sub-factor. (Note that the LGBTQ+ entrepreneurship rate correction factor here is also computed using the same best-in-class methodology.) The achievable represents the full potential contribution of LGBTQ+ entrepreneurs for each sub-factor, for example
From that achievable estimate, the Index can then compute the achievement gap, the difference between the measured impact of local entrepreneurs and the BiC model of what that impact could be
Each sub-factor has its own gap that are combined to produce the full achievement gap for each Metro
The achievement gap is finally normalized to range between 0 and 100, becoming the Index Score reported by the Index.
In addition to the map and the Metro-specific insets, the Index provides alternate means of visualizing the many variables it computes. For example, the “plot” view remaps the bubble representation of each Metro from the national map of longitude and latitude to an economic map of the total metro population vs. per-capita GDP. This allows the viewer to understand any relationship between city size, wealth, and achievement gap. Alternate axes for comparison in the “plot” view include GDP per Capita, Metro Population, and Jobs Creation.
Metros can also be compared along with a single variable in our Metro comparison “chart” view. The Metros on the charts are ordered by their relative performance along with the variable of interest. An additional chart shows the rate of entrepreneurship of a group of interest within a Metros total population.
To begin to explore the causes behind the achievement gap, the Index analyses likely factors that set the context for entrepreneurial success. One particularly relevant factor is the relative composition of industries within a given Metro region. The Index computes an industry fingerprint for each Metro to provide a visual summary of inclusion at the industry level.
To create this fingerprint, the Index uses the Global Industry Classification Standard (GICS) to count the number of companies in each industry. (Individual companies were allowed to be counted in more than one industry.) These counts were then normalized by the total number of companies in a given Metro, giving a proportion of industry in the Metro economy. Industry proportions were also computed at a national level. Finally, for each industry in each Metro, a likelihood ratio was computed by dividing the Metro proportion for that industry by the national proportion. This ratio indicates whether a given industry is over- or under-represented in a local economy compared to the nation at large.
For each Metro, the Index computes an industry weight as
where i is each Metro.
The Index computes an industry gap as a weighted average of the individual impact scores by the industry weights
The industry fingerprint then represents the size of the industry across all entrepreneurs within the Metro (target and comparison) as well as the weighted average of the achievement gap across the target population for the entire nation. This offers some insight into why a city might be particularly challenging or successful for a given target population.