Tutorials ========= .. note:: Click on the `repository link `_ to download the zip file, called **tutorialData.zip**, of the data you'll need to run through the tutorials. Make a working directory and do the tutorials in that directory. .. image:: http://mickeslab.com/wp-content/uploads/2022/03/tutorial1getData.gif :alt: getting the tutorial data .. note:: In the gif above, the working directory is located at Documents/demos/. The tutorial data files were placed in the directory called "demos." Whatever you call it, when working in the terminal, you'll always need to be in that directory to complete the tutorial. The gif also showed a few commands that you may find helpful: * ``pwd`` to find the directory you're currently in * ``cd`` to change directories * ``ls`` to list files in the folder This tutorial builds up! Between each example and the previous examples, the new lines of code are highlighted in yellow. Python ------ Python is an interpreted object oriented programming language. There is a large range of modules that are imported into python to provide extra functionality or features. pyWitness uses numpy (numerical arrays), scipy (fitting and functions), pandas (data frames), matplotlib (plotting), openpyxl (reading/writing excel), xlrd (reading/writing excel), and numba (compiler to speed up code). Python is best started from a terminal/command prompt .. code-block :: console ipython3 --pylab This then lands you in a python console window .. code-block :: console Python 3.7.9 (default, Sep 6 2020, 16:32:30) Type 'copyright', 'credits' or 'license' for more information IPython 7.14.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: Commands can now be typed in to execute python and pyWitness commands. Here are some helpful tips to speed up inputing commands * Cut and paste commands (to reduce typos) * Use the command history (up and down cursor arrows) to find commands that were used previously * Use command history with search (so try ``import pyW`` and then up arrow. This will search the command history with that command fragment and probably match with a previous ``import pyWitness`` * A command can be completed by using ``tab``. Try typing in ``import pyW`` and then pressing ``tab`` * To get help on a command, type the function and then ``?`` for example, ``dp.plotROC?`` Loading raw experimental data ----------------------------- Remember, you may need to activate pyWitness when you start a terminal by using this code .. code-block :: python conda activate pyWitness Start up ipython3 with .. code-block :: python ipython3 --pylab and pyWitness with .. code-block :: python import pyWitness .. image:: http://mickeslab.com/wp-content/uploads/2022/03/tutorial1rightDirectoryStartPyWitness.gif :alt: getting to the right place A single Python class `pyWitness.DataRaw <./moduledocs.html#pyWitness.DataRaw>`_ is used to load raw data in either ``csv`` or ``excel`` format. The format of ``test1.csv`` is the same as that described in the introduction. .. tabs:: .. code-tab:: python import pyWitness dr = pyWitness.DataRaw("test1.csv") .. code-tab:: R pyw <- import("pyWitness") Checking and exploring loaded data ---------------------------------- It is useful to understand what columns and data values are stored in the raw data. .. tabs:: .. code-tab:: python :linenos: :emphasize-lines: 3 import pyWitness dr = pyWitness.DataRaw("test1.csv") dr.checkData() .. code-tab:: R :linenos: :emphasize-lines: 3 pyw <- pyw <- import("pyWitness") dr <- pyw$DataRaw("./test1.csv") dr$checkData() .. code-block :: console DataRaw.checkData> DataRaw.checkData> columns : ['Unnamed: 0' 'participantId' 'lineupSize' 'targetLineup' 'responseType' 'confidence' 'responseTime'] DataRaw.checkData> lineupSize : [6] DataRaw.checkData> targetLineup : ['targetAbsent' 'targetPresent'] DataRaw.checkData> responseType : ['fillerId' 'rejectId' 'suspectId'] DataRaw.checkData> confidence : [ 0 10 20 30 40 50 60 70 80 90 100] DataRaw.checkData> number trials : 890 If the unique values for a non-mandatory column are required then this can be displayed using .. tabs:: .. code-tab:: python :linenos: :emphasize-lines: 3 import pyWitness dr = pyWitness.DataRaw("test1.csv") dr.columnValues("responseTime") .. code-tab:: R :linenos: :emphasize-lines: 3 pyw <- import("pyWitness") dr <- pyw$DataRaw("./test1.csv") dr$columnValues("responseTime") .. code-block :: console DataRaw.columnValues> : responseTime [ 1159 1296 1326 ... 161703 502420 651073] It is possible also to load Excel files .. tabs:: .. code-tab:: python import pyWitness dr = pyWitness.DataRaw("test1.xlsx","test1") .. code-tab:: R pyw <- import("pyWitness") dr <- pyw$DataRaw("test1.xlsx","test1") The second argument is the sheet name within the workbook (in the example above, it's "test1"). Processing raw experimental data -------------------------------- To process the raw data the function `pyWitness.DataRaw.process <./moduledocs.html#pyWitness.DataRaw.process>`_ needs to be called on a raw data object. This calculates the cumulative rates from the raw data. .. tabs:: .. code-tab:: python :linenos: :emphasize-lines: 3 import pyWitness dr = pyWitness.DataRaw("test1.csv") dp = dr.process() .. code-tab:: R :linenos: :emphasize-lines: 3 pyw <- import("pyWitness") dr <- pyw$DataRaw("./test1.csv") dp <- dr$process() Once `pyWitness.DataRaw.process <./moduledocs.html#pyWitness.DataRaw.process>`_ is called two ``DataFrames`` are created. One contains a pivot table and the other contains rates. .. tabs:: .. code-tab:: python :linenos: :emphasize-lines: 4-5 import pyWitness dr = pyWitness.DataRaw("test1.csv") dp = dr.process() dp.printPivot() dp.printRates() .. code-tab:: R :linenos: :emphasize-lines: 4-5 pyw <- import("pyWitness") dr <- pyw$DataRaw("./test1.csv") dp <- dr$process() dp$printPivot() dp$printRates() You should see the following output of the ``dp.printPivot()``. .. code-block :: console confidence confidence 0 10 20 30 40 50 60 70 80 90 100 targetLineup responseType targetAbsent fillerId 2.0 7.0 5.0 8.0 10.0 20.0 26.0 20.0 14.0 8.0 6.0 rejectId 2.0 5.0 5.0 6.0 9.0 24.0 35.0 56.0 68.0 43.0 64.0 targetPresent fillerId 0.0 0.0 2.0 3.0 5.0 6.0 5.0 10.0 5.0 4.0 2.0 rejectId 3.0 1.0 0.0 6.0 10.0 20.0 9.0 19.0 23.0 16.0 21.0 suspectId 2.0 1.0 4.0 4.0 10.0 18.0 42.0 68.0 54.0 33.0 41.0 total number of participants 890.0 In the output above are frequencies by confidence levels for each response type. To familiarize you with the output, in the table above, 8 filler identifications were given with 30% confidence on the target-absent lineups, 64 reject identifications (i.e., "The perp is not in the lineup") given with 100% confidence on the target-absent lineups, and 41 guilty suspect identifications (from target-present lineups) given with 100% confidence. You should also see the following output for ``dp.printRates()`` .. code-block :: console confidence confidence 0 10 20 30 40 50 60 70 80 90 100 variable type cac central 0.857143 0.461538 0.827586 0.750000 0.857143 0.843750 0.906475 0.953271 0.958580 0.961165 0.976190 confidence central 0 10 20 30 40 50 60 70 80 90 100 dprime central 1.975221 1.971156 1.992932 1.990193 2.001534 1.990478 1.994925 1.940776 1.742686 1.585873 1.509544 rf 0.007830 0.007271 0.016219 0.017897 0.039150 0.071588 0.155481 0.239374 0.189038 0.115213 0.140940 targetAbsent fillerId 0.284424 0.279910 0.264108 0.252822 0.234763 0.212190 0.167043 0.108352 0.063205 0.031603 0.013544 rejectId 0.715576 0.711061 0.699774 0.688488 0.674944 0.654628 0.600451 0.521445 0.395034 0.241535 0.144470 suspectId 0.047404 0.046652 0.044018 0.042137 0.039127 0.035365 0.027840 0.018059 0.010534 0.005267 0.002257 targetPresent fillerId 0.093960 0.093960 0.093960 0.089485 0.082774 0.071588 0.058166 0.046980 0.024609 0.013423 0.004474 rejectId 0.286353 0.279642 0.277405 0.277405 0.263982 0.241611 0.196868 0.176734 0.134228 0.082774 0.046980 suspectId 0.619687 0.615213 0.612975 0.604027 0.595078 0.572707 0.532438 0.438479 0.286353 0.165548 0.091723 zL central -1.670562 -1.678225 -1.705849 -1.726409 -1.760906 -1.807208 -1.913524 -2.095603 -2.306755 -2.557781 -2.839765 zT central 0.304658 0.292931 0.287082 0.263784 0.240628 0.183270 0.081401 -0.154827 -0.564069 -0.971908 -1.330222 In the table above, the overall false ID rate is 0.047, the overall correct ID rate is 0.620, and the overall correct rejection rate is 0.716. .. note:: In the example there is no ``suspectId`` for ``targetAbsent`` lineups. Here the ``targetAbsent.suspectId`` is estimated as ``targetAbsent.fillerId/lineupSize`` .. image:: http://mickeslab.com/wp-content/uploads/2022/03/tutorial1rates.gif :alt: getting rates and pivots To see overall descriptive statistics, use .. tabs:: .. code-tab:: python import pyWitness dr = pyWitness.DataRaw("test1.csv") dp = dr.process() dp.printDescriptiveStats() .. code-tab:: R pyw <- import("pyWitness") dr <- pyw$DataRaw("./test1.csv") dp <- dr$process() dp$printDescriptiveStats() and you'll see this output: .. code-block :: console Number of lineups 890.0 Number of target-absent lineups 443.0 Number of target-present lineups 447.0 Correct ID rate 0.6196868008948546 False ID rate 0.0474040632054176 dPrime 1.9752208100241062 pAUC 0.02066155955774986 Plotting ROC curves ------------------- .. tabs:: .. code-tab:: python :linenos: :emphasize-lines: 4 import pyWitness dr = pyWitness.DataRaw("test1.csv") dp = dr.process() dp.plotROC() .. code-tab:: R :linenos: :emphasize-lines: 4-5 pyw <- import("pyWitness") dr <- pyw$DataRaw("./test1.csv") dp <- dr$process() dp$plotROC() mpl$pyplot$show() .. figure:: images/test1ROCnoBin.png :alt: ROC for test1.csv .. note:: The symbol size is the relative frequency and can be changed by setting ``dp.plotROC(relativeFrequencyScale = 400)`` .. note:: The transparency of the plot can be changed by setting ``alpha`` in the plot command, so ``dp.plotROC(alpha = 0.5)`` The black dashed line in the plot represents chance performance. Plotting CAC curves ------------------- .. tabs:: .. code-tab:: python :linenos: :emphasize-lines: 4 import pyWitness dr = pyWitness.DataRaw("test1.csv") dp = dr.process() dp.plotCAC() .. code-tab:: R :linenos: :emphasize-lines: 4-5 pyw <- import("pyWitness") dr <- pyw$DataRaw("./test1.csv") dp <- dr$process() dp$plotCAC() mpl$pyplot$show() .. figure:: images/test1CACnoBin.png :alt: CAC for test1.csv .. image:: http://mickeslab.com/wp-content/uploads/2022/03/tutorial1ROCcac.gif :alt: ROC and CAC plots .. note:: The transparency of the plot can be changed by setting ``alpha`` in the plot command, so ``dp.plotCAC(alpha = 0.5)`` .. warning:: To plot CAC curves with different target-present base rates, the base rate needs to be given to the ``process`` function of ``DataRaw``, for example, ``dp = dr.process(baseRate=0.2)`` Collapsing the categorical data ------------------------------- The dataset used in this tutorial has 11 confidence levels (0, 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100). Often confidence levels need to be binned or collapsed. This is best performed on the raw data before calling ``process()``. This is done with the ``collapseCategoricalData`` method of ``DataRaw``, and shown in example below, where the new bins are (0-60 map to 30, 70-80 to 75 and 90-100 to 95). .. tabs:: .. code-tab:: python :linenos: :emphasize-lines: 3-6 import pyWitness dr = pyWitness.DataRaw("test1.csv") dr.collapseCategoricalData(column='confidence', map={0: 30, 10: 30, 20: 30, 30: 30, 40: 30, 50: 30, 60: 30, 70: 75, 80: 75, 90: 95, 100: 95}) dp = dr.process() dp.plotCAC() .. code-tab:: R :linenos: :emphasize-lines: 3-7 pyw <- import("pyWitness") dr <- pyw$DataRaw("./test1.csv") dr$collapseCategoricalData(column='confidence',map=list("0"=30, "10"=30, "20"=30, "30"=30, "40"=30,"50"=30, "60"=30,"70"=75, "80"=75,"90"=95, "100"=95)) dp <- dr$process() dp$plotCAC() mpl$pyplot$show() .. figure:: images/test1CACBin.png :alt: Rebinned CAC for test1.csv To rescale the axes, you can use .. tabs:: .. code-tab:: python import matplotlib as _plt xlim(0,100) ylim(0.50,1.0) .. code-tab:: R pyw <- import("pyWitness") dr <- pyw$DataRaw("./test1.csv") dr$collapseCategoricalData(column='confidence',map=list("0"=30, "10"=30, "20"=30, "30"=30, "40"=30,"50"=30, "60"=30,"70"=75, "80"=75,"90"=95, "100"=95)) dp <- dr$process() dp$plotCAC() invisible(mpl$pyplot$ylim(0.50,1.0)) mpl$pyplot$show() and you get .. figure:: images/test1CACBinLim.png :alt: CAC rescaled .. note:: If you err, the ``collapseCategoricalData`` the data might be inconsistent. To start with the original data so call ``collapseCategoricalData`` with ``reload=True`` Collapsing (binning) continuous data ------------------------------------ Some data are not categorical variables, but continuous variables. .. tabs:: .. code-tab:: python :linenos: :emphasize-lines: 3 import pyWitness dr = pyWitness.DataRaw("test1.csv") dr.collapseContinuousData(column = "confidence",bins = [-1,60,80,100],labels= [1,2,3]) dp = dr.process() dp.plotROC() .. code-tab:: R :linenos: :emphasize-lines: 3 pyw <- import("pyWitness") dr <- pyw$DataRaw("./test1.csv") dr$collapseContinuousData(column = "confidence", bins = c(-1,60,80,100),labels= c(1,2,3)) dp <- dr$process() dp$plotROC() mpl$pyplot$show() .. note:: ``labels=None`` can be used and the bins will be automatically labelled .. note:: The bin edges are exclusive of the low edge and inclusive of the high edge .. warning:: Confidence needs to be a numerical value because ROC analysis requires a value that can be ordered. Calculating pAUC and performing statistical tests ------------------------------------------------- pAUC is calculated when ``dr.process()`` is called. Simpson's rule integrates the area under the ROC curve up to a maximum value. If the maximum value is between two data points, linear interpolation is used to calculate the most liberal point (i.e., the lowest level of confidence). .. tabs:: .. code-tab:: python :linenos: :emphasize-lines: 5 import pyWitness dr = pyWitness.DataRaw("test1.csv") dr.collapseContinuousData(column = "confidence",bins = [-1,60,80,100],labels= [1,2,3]) dp = dr.process() print(dp.pAUC) .. code-tab:: R :linenos: :emphasize-lines: 5 pyw <- import("pyWitness") dr <- pyw$DataRaw("./test1.csv") dr$collapseContinuousData(column = "confidence",bins = c(-1,60,80,100),labels= c(1,2,3)) dp <- dr$process() print(dp$pAUC) .. figure :: images/test1_pAUC.jpg :alt: Data-model ROC comparision for test1.csv Plotting RAC curves ------------------- To perform analyses with a different variable than confidence, for example, response time, use the following code. The important change is highlighted. .. tabs:: .. code-tab:: python :linenos: :emphasize-lines: 4 import pyWitness drRAC = pyWitness.DataRaw("test1.csv") drRAC.collapseContinuousData(column="responseTime",bins=[0, 5000, 10000, 15000, 20000, 99999],labels=[1, 2, 3, 4, 5]) dpRAC = drRAC.process(reverseConfidence=True,dependentVariable="responseTime") dpRAC.plotCAC() .. code-tab:: R :linenos: :emphasize-lines: 5 pyw <- import("pyWitness") drRAC <- pyw$DataRaw("./test1.csv") drRAC$collapseContinuousData(column="responseTime", bins=c(0, 5000, 10000, 15000, 20000, 99999),labels=c(1, 2, 3, 4, 5)) dpRAC <- drRAC$process(reverseConfidence=TRUE,dependentVariable="responseTime") dpRAC$plotCAC() invisible(mpl$pyplot$xlabel("Response Time")) invisible(mpl$pyplot$ylim(.50,1.0)) invisible(mpl$pyplot$savefig("test1RAC.png")) invisible(mpl$pyplot$savefig("test1RAC.pdf")) The plot will look like this: .. figure :: images/test1RAC.png :alt: RAC for test1 Fitting signal detection-based models to data --------------------------------------------- There are many models available in pyWitness. We'll start with the independent observation model. To load and process the data is the same as before (lines 1-4), the fitting part is new and the code is highlighted (lines 5-7). .. tabs:: .. code-tab:: python :linenos: :emphasize-lines: 5-7 import pyWitness dr = pyWitness.DataRaw("test1.csv") dr.collapseContinuousData(column = "confidence",bins = [-1,60,80,100],labels= [1,2,3]) dp = dr.process() mf = pyWitness.ModelFitIndependentObservation(dp) mf.setEqualVariance() mf.fit() .. code-tab:: R :linenos: :emphasize-lines: 5-7 pyw <- import("pyWitness",convert=TRUE) dr <- pyw$DataRaw("./test1.csv") dr$collapseContinuousData(column = "confidence",bins = c(-1,60,80,100),labels= c(1,2,3)) dp <- dr$process() mf <- pyw$ModelFitIndependentObservation(dp) mf$setEqualVariance() mf$fit() Line 5 constructs a fit object, line 6 sets the model parameters to equal variance and line 7 starts the minimiser. The output from the fit (execution of line 7) is something like the following .. code-block :: console fit iterations 223 fit status Optimization terminated successfully. fit time 9.376720442 fit chi2 10.300411274463407 fit ndf 4 fit chi2/ndf 2.5751028186158518 fit p-value 0.035660197825222784 .. image:: http://mickeslab.com/wp-content/uploads/2022/03/tutorial1modelFitPara.gif :alt: Model fit details and parameters To clearly see how the fitting works, the following code is the same as above but with ``mf.printParameters()`` on lines 6, 9, and 12. .. tabs:: .. code-tab:: python :linenos: :emphasize-lines: 6,9,12 import pyWitness dr = pyWitness.DataRaw("test1.csv") dr.collapseContinuousData(column = "confidence",bins = [-1,60,80,100],labels= [1,2,3]) dp = dr.process() mf = pyWitness.ModelFitIndependentObservation(dp) mf.printParameters() mf.setEqualVariance() mf.printParameters() mf.fit() mf.printParameters() .. code-tab:: R :linenos: :emphasize-lines: 6,9,12 pyw <- import("pyWitness") dr <- pyw$DataRaw("./test1.csv") dr$collapseContinuousData(column = "confidence",bins = c(-1,60,80,100),labels= c(1,2,3)) dp <- dr$process() mf <- pyw$ModelFitIndependentObservation(dp) mf$printParameters() mf$setEqualVariance() mf$printParameters() mf$fit() mf$printParameters() After creating the ``mf`` object (line 9) the parameters are at their default values and free .. code-block :: console lureMean 0.0 (free) lureSigma 1.0 (free) targetMean 1.0 (free) targetSigma 1.0 (free) lureBetweenSigma 0.0 (free) targetBetweenSigma 0.0 (free) c1 1.0 (free) c2 1.5 (free) c3 2.0 (free) Typically you would want to control the fit parameters. ``setEqualVariance`` sets some default model which is an appropriate start; line 12 yields .. code-block :: console lureMean 0.0 (fixed) lureSigma 1.0 (fixed targetSigma) targetMean 1.0 (free) targetSigma 1.0 (fixed) lureBetweenSigma 0.3 (fixed targetBetweenSigma) targetBetweenSigma 0.3 (free) c1 1.0 (free) c2 1.5 (free) c3 2.0 (free) Comparing these two fit parameters settings * ``lureSigma`` is forced to be equal to ``targetSigma`` * ``targetSigma`` is fixed to its current value * ``lureBetweenSigma`` is fixed to ``targetBetweenSigma`` * ``targetBetweenSigma`` is fixed to its current value After running the fit the parameters are updated so the output of line 12 in the code example gives .. code-block :: console ModelFit.printParameters> lureMean 0.000 (fixed) ModelFit.printParameters> lureSigma 1.000 (fixed targetSigma) ModelFit.printParameters> targetMean 1.798 (free) ModelFit.printParameters> targetSigma 1.000 (fixed) ModelFit.printParameters> lureBetweenSigma 0.605 (fixed targetBetweenSigma) ModelFit.printParameters> targetBetweenSigma 0.605 (free) ModelFit.printParameters> c1 1.402 (free) ModelFit.printParameters> c2 1.935 (free) ModelFit.printParameters> c3 2.677 (free) There many ways to control the model .. list-table:: Parameter control examples :widths: 70 70 :header-rows: 1 * - Command - Notes * - ``mf.lureMean.value = -0.1`` - Sets the lure mean parameter to -0.1 * - ``mf.targetMean.fixed = True`` - Fixed the parameter so it cannot change during a fit * - ``mf.lureMean.fixed = False`` - Unfixes the parameter so it will be free in a fit * - ``mf.c1.set_equal(mf.c2)`` - Locks ``c1`` and ``c2`` together * - ``mf.lureBetweenSigma.unset_equal()`` - Release the linking of lureBetweenSigma and targetBetweenSigma There are multiple fits available and they all have the same interface but differ in the construction line .. tabs:: .. code-tab:: python :linenos: :emphasize-lines: 5-8 dr = pyWitness.DataRaw("test1.csv") dr.collapseContinuousData(column="confidence") dp = dr.process() mf_io = pyWitness.ModelFitIndependentObservation(dp) mf_br = pyWitness.ModelFitBestRest(dp) mf_en = pyWitness.ModelFitEnsemble(dp) mf_in = pyWitness.ModelFitIntegration(dp) .. code-tab:: R :linenos: :emphasize-lines: 5-8 pyw <- import("pyWitness") dr <- pyw$DataRaw("./test1.csv") dp <- dr$process() mf_io <- pyw$ModelFitIndependentObservation(dp) mf_br <- pyw$ModelFitBestRest(dp) mf_en <- pyw$ModelFitEnsemble(dp) mf_in <- pyw$ModelFitIntegration(dp) Setting initial fit parameters ------------------------------ With data samples with large number of confidence bins the fits can take a large number of iterations to converge (long run times). Sensible fit parameters can be be estimated from the data. To estimate the target mean :math:`\mu_t` and sigma :math:`\sigma_t` the following relation is used .. math :: Z(R_{T,i}) = \frac{Z(R_{L,i})- \mu_t}{\sigma_t} Rearranging gives .. math :: \sigma_t Z(R_{T,i}) = Z(R_{L,i}) - \mu_s There is a linear relationship between target and lure :math:`Z` values. This can be plotted and a linear fit used to estimate the gradient and intercept. .. tabs:: .. code-tab:: python :linenos: :emphasize-lines: 5 import pyWitness dr = pyWitness.DataRaw("test1.csv") dr.collapseContinuousData(column = "confidence",bins = [-1,60,80,100],labels= [1,2,3]) dp = dr.process() dp.plotHitVsFalseAlarmRate() .. code-tab:: R :linenos: :emphasize-lines: 5 pyw <- import("pyWitness") dr <- pyw$DataRaw("./test1.csv") dr$collapseContinuousData(column = "confidence",bins = c(-1,60,80,100),labels = c(1,2,3)) dp <- dr$process() dp$plotHitVsFalseAlarmRate() invisible(mpl$pyplot$savefig("HvFA.png")) .. figure:: images/HvFA.png :alt: Hit rate vs. false alarm rate for test1.csv .. tabs:: .. code-tab:: python :linenos: :emphasize-lines: 9 import pyWitness dr = pyWitness.DataRaw("test1.csv") dr.collapseContinuousData(column = "confidence",bins = [-1,60,80,100],labels= [1,2,3]) dp = dr.process() mf = pyWitness.ModelFitIndependentObservation(dp) mf.printParameters() mf.setEqualVariance() mf.setParameterEstimates() mf.printParameters() mf.fit() mf.printParameters() .. code-tab:: R :linenos: :emphasize-lines: 8 pyw <- import("pyWitness") dr <- pyw$DataRaw("./test1.csv") dr$collapseContinuousData(column = "confidence",bins = c(-1,60,80,100),labels = c(1,2,3)) dp <- dr$process() mf <- pyw$ModelFitIndependentObservation(dp) mf$setEqualVariance() mf$setParameterEstimates() mf$printParameters() mf$fit() mf$printParameters() .. Checking the convergence of fit ------------------------------- Loading and saving fit parameters for later use ----------------------------------------------- Plotting fit and models ----------------------- It is important to understand the performance of a given particular fit. The following plot compares the experimental data to the model fit. .. tabs:: .. code-tab:: python import pyWitness dr = pyWitness.DataRaw("test1.csv") dr.collapseContinuousData(column = "confidence",bins = [-1,60,80,100],labels= None) dp = dr.process() dp.calculateConfidenceBootstrap(nBootstraps=200) mf = pyWitness.ModelFitIndependentObservation(dp) mf.setEqualVariance() mf.fit() .. code-tab:: R pyw <- import("pyWitness") dr <- pyw$DataRaw("./test1.csv") dr$collapseContinuousData(column = "confidence",bins = c(-1,60,80,100),labels = c(1,2,3)) dp <- dr$process() dp$calculateConfidenceBootstrap(nBootstraps=as.integer(200)) mf <- pyw$ModelFitIndependentObservation(dp) mf$setEqualVariance() mf$fit() To compare an *ROC* plot between data and fit .. tabs:: .. code-tab:: python dp.plotROC(label="Data") mf.plotROC(label="Indep. obs. fit") import matplotlib.pyplot as _plt _plt.legend() .. code-tab:: R pyw <- import("pyWitness") dr <- pyw$DataRaw("./test1.csv") dr$collapseContinuousData(column = "confidence",bins = c(-1,60,80,100),labels = c(1,2,3)) dp <- dr$process() dp$calculateConfidenceBootstrap(nBootstraps=as.integer(200)) mf <- pyw$ModelFitIndependentObservation(dp) mf$setEqualVariance() mf$fit() dp$plotROC(label="Data") mf$plotROC(label="Indep. obs. fit") mpl$pyplot$legend() mpl$pyplot$show() .. figure:: images/test1ROCcomparisonBin.png :alt: Data-model ROC comparision for test1.csv .. image:: http://mickeslab.com/wp-content/uploads/2022/03/tutorial1fitDataROCplot.gif :alt: ROC data and model fit plotted To compare a *CAC* plot between data and fit .. tabs:: .. code-tab:: python dp.plotCAC(label="Data") mf.plotCAC(label="Indep. obs. fit") import matplotlib.pyplot as _plt _plt.legend() .. code-tab:: R dp$plotCAC(label="Data") mf$plotCAC(label="Indep. obs. fit") mpl$pyplot$legend() .. figure:: images/test1CACcomparisonBin.png :alt: Data-model CAC comparision for test1.csv To compare frequencies in each bin between data and fit .. tabs:: .. code-tab:: python mf.plotFit() .. code-tab:: R mf$plotFit() .. figure:: images/testPlotFit.png :alt: Data-model comparision for test1.csv Once a fit has been performed, the model can be displayed as a function of memory strength and includes the lure and target distributions with means and standard deviations (top panel of plot below) and the associated criteria, c1 (low confidence), c2 (medium confidence), and c3 (high confidence) (bottom panel of plot below). This simple command belonging to a ModelFit object can be used to make the plot below. .. tabs:: .. code-tab:: python mf.plotModel() .. code-tab:: R mf$plotModel() .. figure:: images/testPlotModel.png :alt: Independent Observation model fit d-prime calculation ------------------- The d-prime can be calculated by computing .. math :: d^{\prime} = Z(R_{T,i}) - Z(R_{L,i}) where :math:`R_{T,i}` is the cumulative rate for targets (:math:`T`) with confidence :math:`i`, :math:`R_{L,i}` is the cumulative rate for lures (:math:`L`) with confidence :math:`i` and :math:`Z` is the inverse normal CDF. This can be evaluated for every confidence bin, but there are conventions for lineups and showups. For all confidence levels :math:`d^{\prime}` is stored in the rates dataframe, so ``dp.printRates()`` gives .. code-block :: console :linenos: :emphasize-lines: 6 confidence confidence 3 2 1 targetLineup responseType cac central 0.956357 0.940618 0.839228 confidence central 95.588235 74.859335 44.778068 dprime central 1.433207 1.748223 1.767339 rf 0.264691 0.422903 0.312406 targetAbsent fillerId 0.044660 0.141748 0.335922 rejectId 0.217476 0.473786 0.664078 suspectId 0.007443 0.023625 0.055987 targetPresent fillerId 0.018832 0.080979 0.152542 rejectId 0.080979 0.163842 0.276836 suspectId 0.158192 0.406780 0.570621 A member variable ``dPrime`` in ``DataProcessed`` is set according to * Lineup convention :math:`d^{\prime}` is the lowest confidence (most liberal) so ``dp.dPrime`` is ``1.767339`` * Showup convention :math:`d^{\prime}` is the lowest positive confidence :math:`d` can also be calculated from a signal detection model so .. math :: d = \frac{\mu_{T} - \mu_{L}}{ \sqrt{\frac{\sigma_T^2 + \sigma_L^2}{2}} } This is calculated from the fit parameters for the fits described in the previous section so .. code-block :: console In [X]: mf.d Out[X]: 1.7976601843420954 Writing results to file ----------------------- The internal dataframes can be written to either ``csv`` or ``xlsx`` file format for further analysis. There are four functions belonging to ``DataProcessed``. * ``writePivotExcel`` writes the pivot table to excel * ``writePivotCsv`` writes the pivot table to csv * ``writeRatesExcel`` writes the cummulative rates table to excel * ``writeRatesCsv`` writes the cummulative rates table to csv The string argument for the functions is the file name. .. tabs:: .. code-tab:: python :linenos: :emphasize-lines: 4-7 import pyWitness dr = pyWitness.DataRaw("test1.csv") dp = dr.process() dp.writePivotExcel("test1_pivot.xlsx") dp.writePivotCsv("test1_pivot.csv") dp.writeRatesExcel("test1_rates.xlsx") dp.writeRatesCsv("test1_rates.csv") .. code-tab:: R :linenos: :emphasize-lines: 4-7 pyw <- import("pyWitness") dr <- pyw$DataRaw("./test1.csv") dp <- dr$process() dp$writePivotExcel("./test1_pivot.xlsx") dp$writePivotCsv("./test1_pivot.csv") dp$writeRatesExcel("./test1_rates.xlsx") dp$writeRatesCsv("./test1_rates.csv") .. figure:: images/test1PivotExcel.png .. figure:: images/test1RatesExcel.png Designated innocent suspect in target absent (TA) lineups --------------------------------------------------------- In TA lineups the suspect ID rate is estimated as the fillerID/lineupSize. In some experiments a ``designated innocent suspect`` is used. This is possible with pyWitness. * The raw data in the ``responseType`` column will have an extra possible type called ``designateId`` * Then all of the analyses will proceed normally, but now the targetAbsent suspectId row will be populated with data from the designated innocent suspect * Sometimes it is useful to check if raw data contains ``designateId``, this is done by calling ``dataRaw.isDesignateId`` * To convert all ``designateId`` to ``fillerIds`` one must call ``dataRaw.removeDesignates()``