Runner#

Splitting samples that have the same weight#

static RunAnalysis.splitSamples(samples, useFilesPerJob=True)[source][source]

static methods, takes a dictionary of samples and split them based on their weights and max num. of files

Parameters:
samplesdict

dictionary of samples

useFilesPerJobbool, optional, default: True

if you want to further split the samples based on max num. of files.

Returns:
list of tuple

each tuple will have a lenght of 5 (6 if subsamples are present), where the first element is the name of the sample, the second the list of files, the third the weight, and the fourth the index of this tuple compared to the other tuples of the same sample type, the fifth will be the isData flag (True if the sample is data, False otherwise). If subsamples are present, the sixth element will be the dict of subsamples

Instanciate the RunAnalysis class#

RunAnalysis.__init__(samples, aliases, variables, cuts, nuisances, lumi, limit=-1, outputFileMap='output.root')[source][source]

Stores arguments in the class attributes and creates all the RDataFrame objects

Parameters:
sampleslist of tuple

same type as the return of the splitSamples method

aliasesdict

dict of aliases

variablesdict

dict of variables

cutsdict

dict of cuts, contains two keys (preselections: str, cuts: dict)

nuisancesdict

dict of nuisances

lumifloat

lumi in fb-1

limitint, optional, default: -1

limit of events to be processed

outputFileMapstr, optional, defaults: ‘output.root’

full path + filename of the output root file.

Returns:
None

The main function:#

RunAnalysis.run()[source][source]

Runs the analysis:

  1. load the aliases without the afterNuis option

  2. load the suffix systematics

  3. load the alias weight

  4. load the reweight systematics (they need the weight to be defined)

  5. finally load the suffix systematics with the afterNuis option

After this important procedure it filters with preselection the many dfs, loads systematics loads variables, creates the results dict, splits the samples, creates the cuts/var histos, runs the dataframes and saves results.

Runner should be provided with samples, aliases and all the other configuration dictionaries. It will determine how to split the sample splitting and merging of results

runner module#

class mkShapesRDF.shapeAnalysis.runner.RunAnalysis(samples, aliases, variables, cuts, nuisances, lumi, limit=-1, outputFileMap='output.root')[source][source]#

Bases: object

Class athat craeates dfs and runs the analysiss

Methods

convertResults()

Gather resulting histograms and fold them if needed.

createResults()

Create empty dictionary for results, will store all the different histos

create_cuts_vars()

Defines Histo1D for each variable and cut and dataframe.

getNuisanceFiles(nuisance, files)

Searches in the provided nuisance folder for the files with the same name of the nominal files

getTTreeNomAndFriends(fnom, friends)

Create a TChain with the nominal files and the friends files (nuisances TTrees with varied branches)

loadAliasWeight()

Loads only the special alias weight in the dataframes.

loadAliases([afterNuis])

Load aliases in the dataframes.

loadBranches()

Loads branches (the ones specified in an alias with the tree key in them), and checks if they are already in the dataframe columns, if so it adds __ at the beginning of the name.

loadSystematicsReweights()

Loads systematics of type suffix in the dataframes.

loadSystematicsSuffix()

Loads systematics of type suffix in the dataframes.

loadVariables()

Loads variables (not the ones with the 'tree' key in them), and checks if they are already in the dataframe columns, if so it adds __ at the beginning of the name.

run()

Runs the analysis:

saveResults()

Save results in a root file.

splitSamples(samples[, useFilesPerJob])

static methods, takes a dictionary of samples and split them based on their weights and max num.

splitSubsamples()

Split samples into subsamples if needed

index_sub

mergeAndSaveResults

mergeSaveResults

static splitSamples(samples, useFilesPerJob=True)[source][source]#

static methods, takes a dictionary of samples and split them based on their weights and max num. of files

Parameters:
samplesdict

dictionary of samples

useFilesPerJobbool, optional, default: True

if you want to further split the samples based on max num. of files.

Returns:
list of tuple

each tuple will have a lenght of 5 (6 if subsamples are present), where the first element is the name of the sample, the second the list of files, the third the weight, and the fourth the index of this tuple compared to the other tuples of the same sample type, the fifth will be the isData flag (True if the sample is data, False otherwise). If subsamples are present, the sixth element will be the dict of subsamples

static getTTreeNomAndFriends(fnom, friends)[source][source]#

Create a TChain with the nominal files and the friends files (nuisances TTrees with varied branches)

Args:

fnom (list): list of nominal files friends (list of list): list of list of friends files

Returns:

TChain: TChain with nominal files and friends files

static getNuisanceFiles(nuisance, files)[source][source]#

Searches in the provided nuisance folder for the files with the same name of the nominal files

Args:

nuisance (dict): dict with the nuisance information files (list): list of nominal files

Returns:

list of list: list with the down and up varied list of files

static index_sub(string, sub)[source][source]#
__init__(samples, aliases, variables, cuts, nuisances, lumi, limit=-1, outputFileMap='output.root')[source][source]#

Stores arguments in the class attributes and creates all the RDataFrame objects

Parameters:
sampleslist of tuple

same type as the return of the splitSamples method

aliasesdict

dict of aliases

variablesdict

dict of variables

cutsdict

dict of cuts, contains two keys (preselections: str, cuts: dict)

nuisancesdict

dict of nuisances

lumifloat

lumi in fb-1

limitint, optional, default: -1

limit of events to be processed

outputFileMapstr, optional, defaults: ‘output.root’

full path + filename of the output root file.

Returns:
None
dfs#

dfs is a dictionary containing as keys the sampleNames. The structure should look like this:

dfs = {
    'DY':{
        0: {
            'df': obj,
            'columnNames': [...],
            'usedVariables': [...],
            'ttree': obj2, # needed otherwise seg fault in root
        },
    }
}
loadAliases(afterNuis=False)[source][source]#

Load aliases in the dataframes. It does not create the special alias weight for which a special method is used.

Parameters:
afterNuisbool, optional, default: False

if True, only aliases with the key afterNuis set to True will be loaded

loadAliasWeight()[source][source]#

Loads only the special alias weight in the dataframes.

loadSystematicsSuffix()[source][source]#

Loads systematics of type suffix in the dataframes.

loadSystematicsReweights()[source][source]#

Loads systematics of type suffix in the dataframes.

loadVariables()[source][source]#

Loads variables (not the ones with the ‘tree’ key in them), and checks if they are already in the dataframe columns, if so it adds __ at the beginning of the name.

Since variables are shared but not the aliases, it could happen that a variable’s name or expression is already defined for a given df but not for another one -> need to determine a common and compatible set of variables for all the many dfs.

This is done by gathering the largest set of column names.

loadBranches()[source][source]#

Loads branches (the ones specified in an alias with the tree key in them), and checks if they are already in the dataframe columns, if so it adds __ at the beginning of the name.

Since variables are shared but not the aliases, it could happen that a variable’s name or expression is already defined for a given df but not for another one -> need to determine a common and compatible set of variables for all the many dfs.

This is done by gathering the largest set of column names.

createResults()[source][source]#

Create empty dictionary for results, will store all the different histos

splitSubsamples()[source][source]#

Split samples into subsamples if needed

After this method the dfs attribute will be modified to contain the subsamples names instead of the original sample name

create_cuts_vars()[source][source]#

Defines Histo1D for each variable and cut and dataframe. It also creates dictionary for variations through VariationsFor()

convertResults()[source][source]#

Gather resulting histograms and fold them if needed.

Systematics are also saved.

saveResults()[source][source]#

Save results in a root file.

If Snapshot were created will merge them in a output file.

mergeSaveResults()[source][source]#
mergeAndSaveResults()[source][source]#
run()[source][source]#

Runs the analysis:

  1. load the aliases without the afterNuis option

  2. load the suffix systematics

  3. load the alias weight

  4. load the reweight systematics (they need the weight to be defined)

  5. finally load the suffix systematics with the afterNuis option

After this important procedure it filters with preselection the many dfs, loads systematics loads variables, creates the results dict, splits the samples, creates the cuts/var histos, runs the dataframes and saves results.