Methodology for the StartOut Pride Economic Inclusion Impact Index 

The StartOut Pride Economic Impact Index (SPEII) focuses on the impact of high-growth entrepreneurs from historically underrepresented populations. Research has shown that female, Black, and LGBTQ+ founders, among many others, experience systematic barriers to founding new companies. The SPEII measures both their existing contributions to their home cities as well as the achievement gaps caused by those barriers. For the initial launch of the Index, our population of interest is LGBTQ+ entrepreneurs.

Definition of High-Growth Entrepreneur

Local entrepreneurs creating mainstreet businesses are an important contribution to the economy but are not measured here, as the biggest impact on the economy in terms of net new jobs and economic growth comes from high-growth entrepreneurs. Similarly, executives at large existing corporations have a meaningful impact on the business world and people’s lives, but startup founders have a unique connection to measurable outcomes like job creation and exits. A High-Growth Entrepreneur is then defined by the company threshold below. 

Threshold for High-Growth Company

For the purposes of this SPEII, we are interested solely in the founders of “high-growth” companies. As a matter of practical expediency, we laid out a set of minimal criteria to identify “high-growth” companies in terms of funding or economic impact. To formalize our criteria we began with two company-level data sources focused on the entrepreneurial economy: Crunchbase and Pitchbook.

These sources are the core dataset of the SPEII. They contain information about companies’ founders, industry, location, funding history, employment and job creation, and exits (e.g. IPOs and acquisitions) and are updated on a regular basis. The SPEII further augments its dataset with additional information from Wikipedia, the US Patent and Trademark Office, and the US Census Bureau. To credit patent creation to individual founders, the SPEII traces patent assignment through inventors to employers (companies in our dataset) and finally onto the founders of those companies.

The SPEII builds this set of “high-growth” companies and founders based on meeting one of the following criteria:

  1. it has received any amount of Venture Capital funding, or 
  2. it has received at least $1M in risk capital funding, or
  3. it has generated at least one patent and has created jobs beyond the founding team.

These criteria allow us to be flexible in our definition of a high-growth company, allowing for organizations that have had an impact even in the absence of institutional funding. 

Entrepreneur Identification and Attribution

The purpose of the SPEII is to measure how individual high-growth entrepreneurs affect metro economies. It constructs this set of individuals using the dataset of high-growth companies described above. An entrepreneur or founder is any member of the founding team, i.e. anyone present at the company before its first round of funding. If a funding date is not available, founders are identified solely from the Crunchbase and Pitchbook labels.

We choose to include people present at a company before first funding, and not just founders, because we believe anyone that joined the company without a promise of funding (i.e. a salary) made a significant contribution to the company and therefore deserves a share of the company’s impact.

Given a founding team for each company, the SPEII credits the company’s impact evenly among each of the founders. In other words, for a founding team of 5 individuals each founder is given credit for ⅕ of the company’s impact (e.g. ⅕ of funding, ⅕ of jobs, and ⅕ of patents).

Having identified all of the high-growth entrepreneurs in its dataset, the SPEII applies demographic labels to each: race, gender, and sexuality. (We plan to extend these labels to include disability and immigration status.) To identify LGBTQ+ entrepreneurs for this initial launch, we have integrated three specialized datasets. The first is the membership records of StartOut, the largest LGBTQ+ entrepreneurship group in the world and our collaborators in this initiative. The second set of records comes from a data aggregator which provides broad gender and race information to potential employers from 37 publicly available sites.. StartOut engaged the aggregator to create a specialized algorithm for identifying LGBTQ individuals in its data. Finally, we have identified a number of public repositories identifying openly gay and lesbian business leaders, such as Wikipedia. The Index has an internal validation set of known gay and straight entrepreneurs used to weight the accuracy of labels from each source.  

In preliminary work we found that certain demographic labels such as gender were fairly easily derived from our existing sources. However, as previous research has shown, many LGBTQ+ entrepreneurs are either closeted or at least not public about their identity. This is easy to understand given the very same research that has revealed the many barriers historically marginalized entrepreneurs have experienced. In addition to the hidden status of many LGBTQ+ entrepreneurs, StartOut’s data overrepresents the cities in which it maintains active chapters: San Francisco, New York, Los Angeles, Boston, Chicago, and Austin. In order to correct for this overrepresentation and known undercounting of LGBTQ entrepreneurs, we have also developed a statistical model to estimate counts over our outcome variables of interest.

The SPEII employs a non-parametric statistical estimate of the undercount of founders. It estimates a distribution over the likely number of hidden LGBTQ+ entrepreneurs in each Metro. This distribution is derived from three variables:

Metro’s local LGBTQ population (from UCLA’s Williams Institute),

Metro’s entrepreneurship rate (from the Index’s aggregated data)


correction factor for LGBTQ entrepreneurship rates, 


where i is an index over individual Metros.


The first two variables,

give an estimated population of LGBTQ+ entrepreneurs assuming that these two variables are independent of one another. However we know that for most populations of interest entrepreneurship rates are meaningfully different than the general population.

So the final variable,

corrects for this dependence. The SPEII estimates this modifier from the joint rate female entrepreneurship estimated nationally. From its estimates we find that the LGBTQ+ entrepreneurship is roughly half that of the straight populations.

That gives us the Metros population of LGBTQ+ entrepreneurs

All identities of entrepreneurs in the system, LGBTQ+ or otherwise, are anonymized for publication and used for no other purpose than producing the Index.



The core entity of the SPEII is the Metro, derived from the census bureau’s definition of metropolitan statistical area. Metros represent large integrated economic regions. For example, the entirety of the San Francisco Bay Area–Oakland, San Jose, Marin County, and more–are all assigned to San Francisco Metro. The impact of individual entrepreneurs and founders on a Metro is measured in terms of the location of the headquarters of companies they have founded. This means that some prolific founders have had impacts across multiple Metros. This also means that impact measures such as jobs and patents are treated more as indices than explicit economic activity as they are credited back to the founding metro region even if the jobs actually exist in a separate city.


The formal Index is a score for each metro, ranging from 0-100. By combining independent measures of innovation, job creation, and economic activity, this score reveals the impact of high-growth entrepreneurs on the Metro over a fixed period of time. The SPEII Score represents three sub-factors computed independently, combining explicit count and statistical model.


“Jobs” is a measure of the number of jobs created by entrepreneurs of our target population in the Metro region within the specified time period. To arrive at this number the SPEII records the total number of jobs created by each company located within the Metro. Then, as described above, it credits those jobs to the individual founders of the company. For each founder that is a member of the current target population, the SPEII adds their share of job creation to its aggregate jobs measure. This provides a total count of jobs created by the target entrepreneurial population.

Because of the issue of undercounting described above, our count of jobs is extended by a non-parametric estimate of the distribution of likely job creation by those undercounted entrepreneurs. To compute this distribution, we take the distribution of the likely number of undercounted LGBTQ+ entrepreneurs. For example, we might have estimated that there’s a 40% probability that 5 of a given Metro’s 200 funded entrepreneurs are likely LGBTQ+ and a 60% probability that 4 of them are LGBTQ+. A distribution over likely job creation is computed as follows:

  1. Randomly select 5 individuals out of the 200.
  2. Count the number of jobs that those 5 individuals created.
  3. Add that number to the set of possible jobs created.
  4. Randomly select 5 new individuals (with replacement) out of the 200.
  5. Count the number of jobs they created.
  6. Add that number to the set of possible jobs created.
  7. Repeat this process until a stable distribution over possible jobs emerges.
  8. Then randomly select 4 individuals out of the 200.
  9. Again, repeat this process using 4 individuals until a new distribution emerges.
  10. Produce a final distribution of likely job creation by weighting the original two distributions by their initial probabilities.

This gives the SPEII its non-parametric estimated distribution of likely job creation by LGBTQ+ entrepreneurs. If few entrepreneurs in a local population produce jobs then the bulk of this distribution will be 0 additional jobs created. If many entrepreneurs were highly productive, then the distribution accordingly is more likely to include larger job creation values. In either case it allows us to compute a 99% confidence interval over the likely number of jobs our undercounted population generated. For the sake of SPEII visualization, this distribution is simplified to its mean, a single value which is added to the job creation sub-factor.

And so the final jobs sub-factor,

represents a combination of directly counted job creation and statistical estimate.


The job sub-factor is then normalized to provide a final jobs score

From the aggregate variable

 we also compute a mean job score,




The aggregate job score gives the total impact of LGBTQ+ entrepreneurs on job creation within a given Metro. The mean, by comparison, gives us an idea of the individual contributions and challenges of LGBTQ+ entrepreneurs on a one-to-one basis with their straight peers. 


As an initial measure of direct economic impact, the SPEII tracks venture funding and angel investments, as well as other forms of risk capital. The timing, amount, and nature of these investments are derived from the Pitchbook and Crunchbase sources.

The SPEII applies the same nonparametric estimation algorithm and Metro-based normalization that is used for jobs. This results in the following sub-component


To measure innovation, we look at intellectual property which is solely a measurement of patents currently, and which we plan to expand to include research publications, and data on media.

Again, the SPEII applies the nonparametric estimation and Metro-based normalization. This results in the final sub-component

Impact Size

The SPEIIIndex produces two main components: the Index Score and the Impact Size. The Score is a measure of the achievement gap for the given Metro region and will be discussed further below. The Impact Size is an absolute measure of the economic impact of our population of interest, LGBTQ entrepreneurs. 

The Impact Size is computed by adding the three normalized sub-factors together to give a measure of the “economy” for LGBTQ entrepreneurs

Similarly we compute a mean impact size to reflect the average impact of individual LGBTQ+ entrepreneurs in a given metro


One additional metric computed by the SPEIIIndex but not included in the impact size is exits. This measures the total value of all acquisitions and IPOs of companies founded by LGBTQ+ entrepreneurs in a given Metro. The exits measure only counts fully attributable acquisitions and IPOs; no statistical models are applied.

Achievement Gap

There is substantial existing research literature on wage gap, funding gap, and other barriers for female entrepreneurs. Here we seek to understand the scale of what individual Metros could achieve if those barriers were reduced. In the SPEII, this achievable prediction is represented by the purple bar. Rather than just imagine that LGBTQ+ or female entrepreneurs behave identically to their straight and male peers, or that cultural barriers can be immediately removed, the SPEII instead uses a model of “best-in-class” performance for each individual measure. The achievable represents all of the jobs and patents that could have been. The achievement gap is the difference between the Index’s measurements and its predictions.

For each mean sub-factor, such as



 the SPEII computes a best-in-class performance. The Index considers each Metro and removes any Metro with three or fewer members of the population of interest. Of the remaining Metros, it then removes all entrepreneurs whose performance is 2.5 standard deviations above or below that Metro’s mean for that sub-factor. The Index recomputes the mean of the sub-factor after removing those Metros and entrepreneurial outliers, and also computes an outlier-corrected sub-factor mean for non-LGBTQ entrepreneurs.

The three best performing Metros in terms of these sub-factor means are identified. For each Metro, a ratio of LGBTQ to non-LGBTQ sub-factor means is computed and averaged

where j indexes the top 3 cities.

From those ratios, the three best performing Metros are identified (i.e. those with the largest ratios) and then averaged together.

This ratio,



is the relative productivity of LGBTQ+ to non-LGBTQ+ entrepreneurs on each subfactor. It equals 1 if LGBTQ+ founded companies are as productive as non-LGBTQ+ on that specific factor. If it is 0.5, they perform half as well. Our model of achievement gap assumes that the local LGBTQ+ population can meet this same best-in-class productivity ratios in their city.

That BiC ratio is important because we can’t just take the average job creation rate from the top performing city and call it a day. Different cities are different, from industries to workforce profiles to access to funding. Instead, the SPEII applies the BiC jobs ratio to the average job creation rate by traditional entrepreneurs in a given Metro, implicitly accounting for everything unique about that city–entrepreneurship rates, access to capital, number of universities, and more–and only adjusting for the relative performance of the target population. The SPEII assumes every city can achieve this BiC performance ratio.

The BiC model makes one additional assumption around entrepreneurship itself. Women and LGBTQ+ individuals become entrepreneurs at a lower rate than men and straight individuals as has been well studied and is reflected in the Index’s own data. Just as with job creation, this rate varies wildly by Metro. The Index computes a BiC entrepreneurship ratio from the top three cities for the target population. The Index estimates the achievable for a specific Metro by applying the best-in-class ratios to the sub-factors computed for traditional entrepreneurs. For example, job creation in a manufacturing-heavy metro might be higher on average than in a fintech focused Metro. By applying the ratio to real local productivity respects these differences. For example, an idealized mean jobs sub-factor that assumes LGBTQ entrepreneurs in every metro can achieve the same productivity ratio


 is the potential job creation rate for individual local LGBTQ+ entrepreneurs. 


To compute the achievement gap for an entire Metro, we take the statistically inferred population described above

and multiply it by our best-in-class model for each sub-factor. (Note that the LGBTQ+ entrepreneurship rate correction factor here is also computed using the same best-in-class methodology.) The achievable represents the full potential contribution of LGBTQ+ entrepreneurs for each sub-factor, for example

From that achievable estimate the Index can then compute the achievement gap, the difference between the measured impact of local entrepreneurs and the BiC model of what that impact could be

Each sub-factor has its own gap that are combined to produce the full achievement gap for each Metro

The achievement gap is finally normalized to range between 0 and 100, becoming the Index Score reported by the Index.

Metro Comparison

In addition to the map and the Metro-specific insets, the Index provides alternate means of visualizing the many variables it computes. For example, the “plot” view remaps the bubble representation of each Metro from the national map of longitude and latitude to an economic map of total metro population vs. per-capita GDP. This allows the viewer to understand any relationship between city size, wealth, and achievement gap. Alternate axes for comparison in the “plot” view will be available in the future.

Metros can also be compared along a single variable in our Metro comparison “chart” view. The Metros on the charts are ordered by their relative performance along the variable of interest. An additional chart shows the rate of entrepreneurship of a group of interest within a Metros total population. For example, the rate of female entrepreneurs in San Francisco Bay Area is [580] female entrepreneurs to 1M total residents.